Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
Nokogiri not getting html body sometimes
Jarmo Pertman
5/20/2009 5:45:00 PM
I'm using Mechanize to get imdb page and then Nokogiri Node#search
method to get some info from the page, but I've stumbled onto one
special case where #search doesn't work properly, e.g. all other pages
I've tried so far work as expected.
It seems that some special characters are causing the trouble for
Nokogiri, because when I tried to print document itself it outputted
only half of <head> tag and no body tags at all!
Anyway here is the code snippet which I'd expect to output "false" 4
times. Instead, it outputs false, false, true, false. Try with some
other imdb url and it's ok.
require 'mechanize'
mech = WWW::Mechanize.new {|agent| agent.user_agent_alias = 'Windows
Mozilla'}
mech.get("
http://www.imdb.com/title/tt1092...
) do |page|
puts page.search("/html").empty?
puts page.search("/html/head").empty?
puts page.search("/html/body").empty?
puts page.body.empty?
end
What could be the problem?
I'm using ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
--
Posted via
http://www.ruby-...
.
2 Answers
Lui Core
5/21/2009 2:29:00 PM
0
i think you'd better set the encoding first.
mech.get("
http://www.imdb.com/title/tt1092...
) do |page|
page.encoding = 'ISO-8859-1'
#... the rest of ur code
end
--
Posted via
http://www.ruby-...
.
Jarmo Pertman
5/21/2009 4:32:00 PM
0
Thank you! It did the trick.
Best regards,
Jarmo
Lui Core wrote:
> i think you'd better set the encoding first.
>
> mech.get("
http://www.imdb.com/title/tt1092...
) do |page|
> page.encoding = 'ISO-8859-1'
> #... the rest of ur code
> end
--
Posted via
http://www.ruby-...
.
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
Nokogiri not getting html body sometimes
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password