Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
Hpricot not returning the right html??
Hannes Rammer
10/27/2008 3:46:00 PM
Hi when i get this url
http://www.basketball-bund.net/index.jsp?Action=100&V...
it shows a table with basketball info at page 1
if i want to go to page 2
this is the working url
http://www.basketball-bund.net/index.jsp?Action=100&V...
&startrow=10
like it says it starts with the 10th result...
wellif i enter the url into the browser address bar it works fine.. but
when i look for the html in Hpricot it just returns the first page..
ive found out that if the startrow bit is wrong then it always shows the
first page.. but itseems to be right as its working in the browser...
i got the same problem using
URI.parse
here is my code
q =
'
http://www.basketball-bund.net/index.jsp?Action=100&V...
&viewid=&startrow=10'
f = open(q)
f.rewind
doc = Hpricot(Iconv.conv('utf-8', f.charset, f.readlines.join("\n")))
form = doc.search("//form[@name=ligaliste]")
can anyone help me pls
thx
--
Posted via
http://www.ruby-...
.
1 Answer
Hannes Rammer
11/3/2008 11:33:00 AM
0
hmmm noone replied... well in case someone hase the same problem.. i
have found the solution
q = "
http://www.basketball-bund.net/i...
{search_string}"
agent = WWW::Mechanize.new
doc =
agent.get("
http://www.basketball-bund.net/index.jsp?Action=100&Verband...
)
doc = agent.get(q)
doc = doc.search('body').to_html
#convert iso15 to utf8
doc = Iconv.iconv("UTF-8", "ISO-8859-15", doc).to_s
#make it hpricot
doc = Hpricot(doc)
##end crawling
@q = q
form = doc.search("//form")
it seems that its because of cookcies or something.. that i needed to
reload the page once before i try to do my own search.. thats why i call
the agent.get twice
hope this helps anyone
Hannes Rammer wrote:
> Hi when i get this url
>
>
http://www.basketball-bund.net/index.jsp?Action=100&V...
>
> it shows a table with basketball info at page 1
>
> if i want to go to page 2
>
> this is the working url
>
>
http://www.basketball-bund.net/index.jsp?Action=100&V...
&startrow=10
>
> like it says it starts with the 10th result...
>
> wellif i enter the url into the browser address bar it works fine.. but
> when i look for the html in Hpricot it just returns the first page..
>
> ive found out that if the startrow bit is wrong then it always shows the
> first page.. but itseems to be right as its working in the browser...
>
>
> i got the same problem using
>
> URI.parse
>
>
>
> here is my code
>
>
> q =
> '
http://www.basketball-bund.net/index.jsp?Action=100&V...
&viewid=&startrow=10'
>
>
> f = open(q)
> f.rewind
> doc = Hpricot(Iconv.conv('utf-8', f.charset, f.readlines.join("\n")))
> form = doc.search("//form[@name=ligaliste]")
>
>
> can anyone help me pls
>
> thx
--
Posted via
http://www.ruby-...
.
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
Hpricot not returning the right html??
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password