[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

checking for html forms with www::mechenize

Charles Pareto

9/20/2007 8:56:00 PM

I want to run a check to see which pages have forms and which ones don't
from a file with url's. I'm using the size of the form to make that
determination. But after I get to the 13 url in the file I get an error
and the script exists. Does anyone know why?

f = File.open("eliminate.txt")
noformfile = File.new("noform.txt", "w+")
formfile = File.new("form.txt" , "w+")

agent = WWW::Mechanize.new

begin
while (line = f.readline)
page = agent.get(line)
forms = page.forms
if forms.size > 0 then
formfile.puts line
else
noformfile.puts line

end
end
rescue EOFError
puts "error"
end
--
Posted via http://www.ruby-....

4 Answers

Charles Pareto

9/21/2007 12:10:00 AM

0

Chuck Dawit wrote:
> I want to run a check to see which pages have forms and which ones don't
> from a file with url's. I'm using the size of the form to make that
> determination. But after I get to the 13 url in the file I get an error
> and the script exists. Does anyone know why?
>
> f = File.open("eliminate.txt")
> noformfile = File.new("noform.txt", "w+")
> formfile = File.new("form.txt" , "w+")
>
> agent = WWW::Mechanize.new
>
> begin
> while (line = f.readline)
> page = agent.get(line)
> forms = page.forms
> if forms.size > 0 then
> formfile.puts line
> else
> noformfile.puts line
>
> end
> end
> rescue EOFError
> puts "error"
> end

This is the error message I'm getting:

c:/ruby/lib/ruby/1.8/net/protocol.rb:133:in `sysread': end of file
reached (EOFError)
from c:/ruby/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill'
from c:/ruby/lib/ruby/1.8/timeout.rb:56:in `timeout'
from c:/ruby/lib/ruby/1.8/timeout.rb:76:in `timeout'
from c:/ruby/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
from c:/ruby/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
from c:/ruby/lib/ruby/1.8/net/protocol.rb:126:in `readline'
from c:/ruby/lib/ruby/1.8/net/http.rb:2017:in `read_status_line'
from c:/ruby/lib/ruby/1.8/net/http.rb:2006:in `read_new'
from c:/ruby/lib/ruby/1.8/net/http.rb:1047:in `request'
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:514:in
`fetch_page'
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:600:in
`fetch_page'
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:185:in
`get'
from ciscoScrape.rb:120
from ciscoScrape.rb:118:in `each'
from ciscoScrape.rb:118
--
Posted via http://www.ruby-....

Jano Svitok

9/21/2007 3:14:00 PM

0

On 9/20/07, Chuck Dawit <chuckdawit@gmail.com> wrote:
> I want to run a check to see which pages have forms and which ones don't
> from a file with url's. I'm using the size of the form to make that
> determination. But after I get to the 13 url in the file I get an error
> and the script exists. Does anyone know why?

The error means mechanize could not read the webpage. Find out if it's
really the 13th url, no matter in what order they are, or whether is
it some particular url that makes problems.
(find the offending url and try that on its own).

If it's some particular url, try accessing the page from browser.
Otherwise, it might be a problem with mechanize and/or Net::Http or
anything that they use.

Finally few changes/enhancements, not related to your problem:

File.open("eliminate.txt") do |f|
noformfile = File.new("noform.txt", "w+")
formfile = File.new("form.txt" , "w+")

agent = WWW::Mechanize.new

f.each do |line|
page = agent.get(line)
forms = page.forms
if forms.size > 0 then
formfile.puts line
else
noformfile.puts line
end
end
end

Charles Pareto

9/21/2007 3:26:00 PM

0

Jano Svitok wrote:
> On 9/20/07, Chuck Dawit <chuckdawit@gmail.com> wrote:
>> I want to run a check to see which pages have forms and which ones don't
>> from a file with url's. I'm using the size of the form to make that
>> determination. But after I get to the 13 url in the file I get an error
>> and the script exists. Does anyone know why?
>
> The error means mechanize could not read the webpage. Find out if it's
> really the 13th url, no matter in what order they are, or whether is
> it some particular url that makes problems.
> (find the offending url and try that on its own).
>
> If it's some particular url, try accessing the page from browser.
> Otherwise, it might be a problem with mechanize and/or Net::Http or
> anything that they use.
>
> Finally few changes/enhancements, not related to your problem:
>
> File.open("eliminate.txt") do |f|
> noformfile = File.new("noform.txt", "w+")
> formfile = File.new("form.txt" , "w+")
>
> agent = WWW::Mechanize.new
>
> f.each do |line|
> page = agent.get(line)
> forms = page.forms
> if forms.size > 0 then
> formfile.puts line
> else
> noformfile.puts line
> end
> end
> end




This is the error message I'm getting. It's not related to the 13th url
its more like a buf overflow problem. It will crash on anyones pc.

c:/ruby/lib/ruby/1.8/net/protocol.rb:133:in `sysread': end of file
reached (EOFError)
from c:/ruby/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill'
from c:/ruby/lib/ruby/1.8/timeout.rb:56:in `timeout'
from c:/ruby/lib/ruby/1.8/timeout.rb:76:in `timeout'
from c:/ruby/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
from c:/ruby/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
from c:/ruby/lib/ruby/1.8/net/protocol.rb:126:in `readline'
from c:/ruby/lib/ruby/1.8/net/http.rb:2017:in `read_status_line'
from c:/ruby/lib/ruby/1.8/net/http.rb:2006:in `read_new'
from c:/ruby/lib/ruby/1.8/net/http.rb:1047:in `request'
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:514:in
`fetch_page'
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:600:in
`fetch_page'
from
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:185:in
`get'
from ciscoScrape.rb:120
from ciscoScrape.rb:118:in `each'
from ciscoScrape.rb:118
--
Posted via http://www.ruby-....

Jano Svitok

9/21/2007 3:48:00 PM

0

On 9/21/07, Chuck Dawit <chuckdawit@gmail.com> wrote:
> This is the error message I'm getting. It's not related to the 13th url
> its more like a buf overflow problem. It will crash on anyones pc.

Ok, it seems to be a long-time problem - google some words from the
trace, e.g. sysread end of file reached EOFError - it will find posts
from 2005... Though my quick search haven't revealed any solution. You
might be luckier.