Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
General Nokogiri problem
Srijayanth Sridhar
5/7/2009 6:45:00 AM
[Note: parts of this message were removed to make it a legal post.]
Hello,
On several sites(probably malformed HTML/JavaScript/XML/general parsing
hell) I have the following problem.
For ex:
moonwolf@trantor:~/ruby$ irb
irb(main):001:0> ['rubygems','nokogiri','hpricot','open-uri'].each { |r|
require r }
=> ["rubygems", "nokogiri", "hpricot", "open-uri"]
irb(main):002:0> doc=Nokogiri(open("
http://maps.google....
))
=> <?xml version="1.0"?>
<!DOCTYPE html>
<html/>
irb(main):003:0> doc/"a"
=>
Same with Nokogiri.Hpricot:
irb(main):004:0> doc=Nokogiri.Hpricot(open("
http://maps.google....
))
=> <?xml version="1.0"?>
<!DOCTYPE html>
<html/>
However with regular Hpricot:
irb(main):009:0> (Hpricot(open("
http://maps.google....
))/"a").size
=> 53
(the full post of course is too long, so just showed something simpler)
Hpricot by itself of course works. I tried looking and there's not much by
way of documentation or blogs on something like this.
Any suggestions/explanations will be welcome as I like Nokogiri's speed very
much.
I am using:
moonwolf@trantor:~/ruby$ gem list --local | grep -i nokogiri
nokogiri (1.2.3)
moonwolf@trantor:~/ruby$ ruby --version
ruby 1.8.6 (2008-03-03 patchlevel 114) [i686-linux]
Jayanth
3 Answers
Aaron Patterson
5/7/2009 7:03:00 AM
0
On Thu, May 07, 2009 at 03:45:28PM +0900, Srijayanth Sridhar wrote:
> Hello,
>
> On several sites(probably malformed HTML/JavaScript/XML/general parsing
> hell) I have the following problem.
>
> For ex:
>
> moonwolf@trantor:~/ruby$ irb
> irb(main):001:0> ['rubygems','nokogiri','hpricot','open-uri'].each { |r|
> require r }
> => ["rubygems", "nokogiri", "hpricot", "open-uri"]
> irb(main):002:0> doc=Nokogiri(open("
http://maps.google....
))
> => <?xml version="1.0"?>
> <!DOCTYPE html>
> <html/>
>
> irb(main):003:0> doc/"a"
> =>
>
> Same with Nokogiri.Hpricot:
>
> irb(main):004:0> doc=Nokogiri.Hpricot(open("
http://maps.google....
))
> => <?xml version="1.0"?>
> <!DOCTYPE html>
> <html/>
>
> However with regular Hpricot:
>
> irb(main):009:0> (Hpricot(open("
http://maps.google....
))/"a").size
> => 53
> (the full post of course is too long, so just showed something simpler)
>
>
> Hpricot by itself of course works. I tried looking and there's not much by
> way of documentation or blogs on something like this.
>
> Any suggestions/explanations will be welcome as I like Nokogiri's speed very
> much.
Nokogiri detects the XML header and parses it as XML. If you force it
to use the HTML parser, you may be more successfull:
>> (Nokogiri::HTML(open("
http://maps.google....
))/'a').length
=> 53
>>
--
Aaron Patterson
http://tenderlovem...
Srijayanth Sridhar
5/7/2009 7:06:00 AM
0
[Note: parts of this message were removed to make it a legal post.]
Thanks Aaron.
Jayanth
On Thu, May 7, 2009 at 12:32 PM, Aaron Patterson <aaron@tenderlovemaking.com
> wrote:
> On Thu, May 07, 2009 at 03:45:28PM +0900, Srijayanth Sridhar wrote:
> > Hello,
> >
> > On several sites(probably malformed HTML/JavaScript/XML/general parsing
> > hell) I have the following problem.
> >
> > For ex:
> >
> > moonwolf@trantor:~/ruby$ irb
> > irb(main):001:0> ['rubygems','nokogiri','hpricot','open-uri'].each { |r|
> > require r }
> > => ["rubygems", "nokogiri", "hpricot", "open-uri"]
> > irb(main):002:0> doc=Nokogiri(open("
http://maps.google....
))
> > => <?xml version="1.0"?>
> > <!DOCTYPE html>
> > <html/>
> >
> > irb(main):003:0> doc/"a"
> > =>
> >
> > Same with Nokogiri.Hpricot:
> >
> > irb(main):004:0> doc=Nokogiri.Hpricot(open("
http://maps.google....
))
> > => <?xml version="1.0"?>
> > <!DOCTYPE html>
> > <html/>
> >
> > However with regular Hpricot:
> >
> > irb(main):009:0> (Hpricot(open("
http://maps.google....
))/"a").size
> > => 53
> > (the full post of course is too long, so just showed something simpler)
> >
> >
> > Hpricot by itself of course works. I tried looking and there's not much
> by
> > way of documentation or blogs on something like this.
> >
> > Any suggestions/explanations will be welcome as I like Nokogiri's speed
> very
> > much.
>
> Nokogiri detects the XML header and parses it as XML. If you force it
> to use the HTML parser, you may be more successfull:
>
> >> (Nokogiri::HTML(open("
http://maps.google....
))/'a').length
> => 53
> >>
>
> --
> Aaron Patterson
>
http://tenderlovem...
>
>
Srijayanth Sridhar
5/7/2009 7:08:00 AM
0
[Note: parts of this message were removed to make it a legal post.]
Whoops,
irb(main):015:0> (Nokogiri::HTML(open("
http://maps.g...
"))/'a').length
=> 0
Not sure what the deal is.
Jayanth
On Thu, May 7, 2009 at 12:35 PM, Srijayanth Sridhar <srijayanth@gmail.com>wrote:
> Thanks Aaron.
>
> Jayanth
>
>
> On Thu, May 7, 2009 at 12:32 PM, Aaron Patterson <
> aaron@tenderlovemaking.com> wrote:
>
>> On Thu, May 07, 2009 at 03:45:28PM +0900, Srijayanth Sridhar wrote:
>> > Hello,
>> >
>> > On several sites(probably malformed HTML/JavaScript/XML/general parsing
>> > hell) I have the following problem.
>> >
>> > For ex:
>> >
>> > moonwolf@trantor:~/ruby$ irb
>> > irb(main):001:0> ['rubygems','nokogiri','hpricot','open-uri'].each { |r|
>> > require r }
>> > => ["rubygems", "nokogiri", "hpricot", "open-uri"]
>> > irb(main):002:0> doc=Nokogiri(open("
http://maps.g...
"))
>> > => <?xml version="1.0"?>
>> > <!DOCTYPE html>
>> > <html/>
>> >
>> > irb(main):003:0> doc/"a"
>> > =>
>> >
>> > Same with Nokogiri.Hpricot:
>> >
>> > irb(main):004:0> doc=Nokogiri.Hpricot(open("
http://maps.g...
"))
>> > => <?xml version="1.0"?>
>> > <!DOCTYPE html>
>> > <html/>
>> >
>> > However with regular Hpricot:
>> >
>> > irb(main):009:0> (Hpricot(open("
http://maps.g...
"))/"a").size
>> > => 53
>> > (the full post of course is too long, so just showed something simpler)
>> >
>> >
>> > Hpricot by itself of course works. I tried looking and there's not much
>> by
>> > way of documentation or blogs on something like this.
>> >
>> > Any suggestions/explanations will be welcome as I like Nokogiri's speed
>> very
>> > much.
>>
>> Nokogiri detects the XML header and parses it as XML. If you force it
>> to use the HTML parser, you may be more successfull:
>>
>> >> (Nokogiri::HTML(open("
http://maps.g...
"))/'a').length
>> => 53
>> >>
>>
>> --
>> Aaron Patterson
>>
http://tenderlovem...
>>
>>
>
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
General Nokogiri problem
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password