TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


Forums >


ruby from command line timing out?

Jason N.Perkins

1/9/2005 1:08:00 AM

I'm running a script from the command line that's going to take a
couple of hours to complete. Between 15 and 20 minutes into its run,
the script throws an execution expired (Timeout::Error). Is there an
environment variable that I should be looking at modifying? The error
message in its entirety is:

/usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired
from ./spider.rb:6334:in `join'
from ./spider.rb:6334
from ./spider.rb:6334:in `each'
from ./spider.rb:6334

Jason N Perkins

8 Answers

Francis Hwang

1/9/2005 1:15:00 AM


Is it safe to guess, based on the name of the script, that it spiders
web pages? If that's the case, Timeout::Error s are going to happen
quite frequently as a particular web page loads too slowly.

On Jan 8, 2005, at 8:08 PM, Jason N.Perkins wrote:

> I'm running a script from the command line that's going to take a
> couple of hours to complete. Between 15 and 20 minutes into its run,
> the script throws an execution expired (Timeout::Error). Is there an
> environment variable that I should be looking at modifying? The error
> message in its entirety is:
> /usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired
> (Timeout::Error)
> from ./spider.rb:6334:in `join'
> from ./spider.rb:6334
> from ./spider.rb:6334:in `each'
> from ./spider.rb:6334
> --
> Jason N Perkins
> <http://snee...

Francis Hwang

Jason N.Perkins

1/9/2005 1:20:00 AM


On Jan 8, 2005, at 7:14 PM, Francis Hwang wrote:

> Is it safe to guess, based on the name of the script, that it spiders
> web pages? If that's the case, Timeout::Error s are going to happen
> quite frequently as a particular web page loads too slowly.

I'm catching those errors with no problem with a 'rescue'. This seems
to be specific to the script itself.

Jason N Perkins

Bill Atkins

1/9/2005 1:21:00 AM


Can you post the code?


On Sun, 9 Jan 2005 10:19:39 +0900, Jason N. Perkins <jperkins@sneer.org> wrote:
> On Jan 8, 2005, at 7:14 PM, Francis Hwang wrote:
> > Is it safe to guess, based on the name of the script, that it spiders
> > web pages? If that's the case, Timeout::Error s are going to happen
> > quite frequently as a particular web page loads too slowly.
> I'm catching those errors with no problem with a 'rescue'. This seems
> to be specific to the script itself.
> --
> Jason N Perkins
> <http://snee...

$stdout.sync = true
"Just another Ruby hacker.".each_byte do |b|
('a'..'z').step do|c|print c+"\b";sleep 0.007 end;print b.chr
end; print "\n"

Jason N.Perkins

1/9/2005 1:29:00 AM


On Jan 8, 2005, at 7:21 PM, Bill Atkins wrote:

> Can you post the code?

Sure. The blogs variable is an array of the urls of blogs - I intend to
eventually have these urls stored in MySQL, but for now an array works.
I emptied that array so that those sites that I have in it aren't
getting hit by too many people trying to help out. The threading is
derived from a sample in "Programming Ruby." I'd love any additional
feedback outside of dealing with the timeout issue.

#! /usr/local/bin/ruby -w

require 'open-uri'
require 'thread'

blogs = [ ]


# load the blogs into the queue
blogs.each do |blog|
buffer.enq( blog )

consumers = (1..150).map do |i|
Thread.new("consumer #{i}") do |name|
blog = buffer.deq
open( blog ) do |content|
metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
metas.each do |current_meta|
current_meta = current_meta.to_s

if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
name = $1
current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
content = $1

case name
when "geo.position"
print "#{blog} \t #{content} \n"

when "ICBM"
print "#{blog} \t #{content} \n"
rescue Exception
p "#{blog}: $! \n"
end until buffer == :END_OF_WORK

consumers.size.times{ buffer.enq(:END_OF_WORK) }
consumers.each{|th| th.join}
rescue Exception
print $!

Jason N Perkins

Francis Hwang

1/9/2005 3:33:00 PM



Is the line 6334 that shows up in the traceback this line:

> consumers.each{|th| th.join}

And one tip, which may not have anything to do with this problem but
might make your code easier to understand and/or debug: Since threading
is so bloody difficult, I try to make it affect as little of the
program as possible. In a case like your code, for example, I would've
let the threaded part simply handle the loading of the web pages, but
let the parsing happen afterward when all the threads have been joined
again. This is how FeedBlender (http://feedblender.ruby...) does
it, so that way if there's a bug I can figure out if it's because of
the threading or not.

On Jan 8, 2005, at 8:29 PM, Jason N.Perkins wrote:

> On Jan 8, 2005, at 7:21 PM, Bill Atkins wrote:
>> Can you post the code?
> Sure. The blogs variable is an array of the urls of blogs - I intend
> to eventually have these urls stored in MySQL, but for now an array
> works. I emptied that array so that those sites that I have in it
> aren't getting hit by too many people trying to help out. The
> threading is derived from a sample in "Programming Ruby." I'd love any
> additional feedback outside of dealing with the timeout issue.
> #! /usr/local/bin/ruby -w
> require 'open-uri'
> require 'thread'
> blogs = [ ]
> buffer=Queue.new
> # load the blogs into the queue
> blogs.each do |blog|
> buffer.enq( blog )
> end
> consumers = (1..150).map do |i|
> Thread.new("consumer #{i}") do |name|
> begin
> blog = buffer.deq
> open( blog ) do |content|
> begin
> metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
> metas.each do |current_meta|
> current_meta = current_meta.to_s
> if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
> name = $1
> current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
> content = $1
> case name
> when "geo.position"
> print "#{blog} \t #{content} \n"
> when "ICBM"
> print "#{blog} \t #{content} \n"
> end
> end
> end
> rescue Exception
> p "#{blog}: $! \n"
> end
> end
> end until buffer == :END_OF_WORK
> end
> end
> begin
> consumers.size.times{ buffer.enq(:END_OF_WORK) }
> consumers.each{|th| th.join}
> rescue Exception
> print $!
> end
> --
> Jason N Perkins
> <http://snee...

Francis Hwang


1/9/2005 4:49:00 PM


["Jason N.Perkins" <jperkins@sneer.org>, 2005-01-09 02.29 CET]
> begin
> consumers.size.times{ buffer.enq(:END_OF_WORK) }
> consumers.each{|th| th.join}
> rescue Exception
> print $!
> end

I think, when the thread that is being "joined" raises timeout error, the
program will finish and the other threads won't be joined. Maybe you should
put the begin...rescue around the join (inside the each).

Hope this helps. Good luck.

Jason N.Perkins

1/9/2005 5:42:00 PM


On Jan 9, 2005, at 9:33 AM, Francis Hwang wrote:

> Jason,
> Is the line 6334 that shows up in the traceback this line:
>> consumers.each{|th| th.join}

Yeah, that's the line that's timing out and why I was wondering if
there's a global timeout value for the script that I can either modify
up or turn off completely.

> And one tip, which may not have anything to do with this problem but
> might make your code easier to understand and/or debug: Since
> threading is so bloody difficult, I try to make it affect as little of
> the program as possible. In a case like your code, for example, I
> would've let the threaded part simply handle the loading of the web
> pages, but let the parsing happen afterward when all the threads have
> been joined again. This is how FeedBlender
> (http://feedblender.ruby...) does it, so that way if there's a
> bug I can figure out if it's because of the threading or not.

OK, I'll give that a try. Thanks, Francis!

Jason N Perkins

Eric Hodel

1/10/2005 6:27:00 PM


On 09 Jan 2005, at 09:42, Jason N.Perkins wrote:

> On Jan 9, 2005, at 9:33 AM, Francis Hwang wrote:
>> Jason,
>> Is the line 6334 that shows up in the traceback this line:
>>> consumers.each{|th| th.join}
> Yeah, that's the line that's timing out and why I was wondering if
> there's a global timeout value for the script that I can either modify
> up or turn off completely.

Timeout::Error comes from timeout.rb.

Your Timeout::Error probably comes out of HTTP, open-uri doesn't
require timeout, and has no timeout blocks.

Try Thread.abort_on_exception = true at the top of your script, and
remove the begin/end block inside the thread.

Eric Hodel - drbrain@segment7.net - http://se...
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04