Asp Forum - Frustrated: System call timeouts

Mikel Lindsaar

9/6/2008 9:43:00 AM

Hello all,

I am having some (un)fun with timing out a database calls.

Basically I have some database calls that go out to a remote database
server on the other side of the planet, (using Rails' active record).

This works all fine, but occasionally, the link gets interrupted and
you get a stale session and the whole thing just locks up waiting for
the call to complete (which it never does).

This then hangs the rake task that is doing a periodic update through
the system cron, and it can jam until you go in and reset it. - quite
annoying.

Trying timeout.rb didn't help, as it does not handle system calls
(except I believe for ones that Ruby makes itself, like file I/O).

Trying system-timer (http://ph7spot.com/articles/sy...) from
Philippe Hanrigou also didn't work - same hang, waiting for a return
call from the DB driver.

The DB adapter is Oracle instant client then OCI, then Oracle Active
Record Adapter, within ActiveRecord called from a rake task (that
includes the environment), so I am basically calling from within a
full rails stack on top of Ruby 1.8.6p36

When the rake task starts, it checks to see if another copy is running
through a lock file and exits if so, so there is only ever one copy of
the rake task running - so it is not some race condition here.

The time outs happen while I am finding an individual row of a table
[Model.find(id)] which is usually a fast operation, in the context of
where I am using it, it is the slowest part of my process, and so
seems to be where the network has the most chance to crap out, so it
is probably not that that bit of the code fails.

Has anyone found a reliable way to timeout this sort of call / does
anyone have any idea why the system timer would _not_ be timing out
this sort of call.

The hard thing is I am not 100% sure where it is failing, I think
(from looking at tcpdump and copious logging) that it is stalling in
that find method, but this I am not 100% sure.

Any pointers from others that must have tackled this problem on where
to go from here? I see my options are:

1) Figure out a solution to this problem (preferred)
2) Abandon it and monitor for a zombie by tailing a log file or the
like for inactivity and then kill appropriately (sounds like a real
hack).

Mikel

10 Answers

ara.t.howard

9/6/2008 2:42:00 PM

On Sep 6, 2008, at 3:43 AM, Mikel Lindsaar wrote:

> Hello all,
>
> I am having some (un)fun with timing out a database calls.
>
> Basically I have some database calls that go out to a remote database
> server on the other side of the planet, (using Rails' active record).
>
> This works all fine, but occasionally, the link gets interrupted and
> you get a stale session and the whole thing just locks up waiting for
> the call to complete (which it never does).
>
> This then hangs the rake task that is doing a periodic update through
> the system cron, and it can jam until you go in and reset it. - quite
> annoying.
>
> Trying timeout.rb didn't help, as it does not handle system calls
> (except I believe for ones that Ruby makes itself, like file I/O).
>
> Trying system-timer (http://ph7spot.com/articles/sy...) from
> Philippe Hanrigou also didn't work - same hang, waiting for a return
> call from the DB driver.
>
> The DB adapter is Oracle instant client then OCI, then Oracle Active
> Record Adapter, within ActiveRecord called from a rake task (that
> includes the environment), so I am basically calling from within a
> full rails stack on top of Ruby 1.8.6p36
>
> When the rake task starts, it checks to see if another copy is running
> through a lock file and exits if so, so there is only ever one copy of
> the rake task running - so it is not some race condition here.
>
> The time outs happen while I am finding an individual row of a table
> [Model.find(id)] which is usually a fast operation, in the context of
> where I am using it, it is the slowest part of my process, and so
> seems to be where the network has the most chance to crap out, so it
> is probably not that that bit of the code fails.
>
> Has anyone found a reliable way to timeout this sort of call / does
> anyone have any idea why the system timer would _not_ be timing out
> this sort of call.
>
> The hard thing is I am not 100% sure where it is failing, I think
> (from looking at tcpdump and copious logging) that it is stalling in
> that find method, but this I am not 100% sure.
>
> Any pointers from others that must have tackled this problem on where
> to go from here? I see my options are:
>
> 1) Figure out a solution to this problem (preferred)
> 2) Abandon it and monitor for a zombie by tailing a log file or the
> like for inactivity and then kill appropriately (sounds like a real
> hack).
>
> Mikel
>

try this

cfp:~/src/ruby > cat timing.rb
Timing.out(2) do
p 'works'
end

Timing.out(1) do
begin
sleep 2
rescue Timed.out
p 'times out'
end
end

Timing.out(1) do
sleep 2
p 'blows up'
end

BEGIN {

module Timing
class Error < ::StandardError; end

def Timing.out *seconds, &block
if seconds.empty?
return Error
else
seconds = Float seconds.first
end

pid = Process.pid
signaler = IO.popen "ruby -e'sleep #{ seconds };
Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
thread = Thread.current
handler = Signal.trap('TERM'){ thread.raise Error,
seconds.to_s }
begin
block.call
ensure
Process.kill 'TERM', signaler.pid rescue nil
Signal.trap('TERM', handler)
end
end

::Timed = Timing
end

}

cfp:~/src/ruby > ruby timing.rb
"works"
"times out"
timing.rb:34:in `out': 1.0 (Timing::Error)
from timing.rb:14:in `call'
from timing.rb:14:in `sleep'
from timing.rb:14
from timing.rb:36:in `call'
from timing.rb:36:in `out'
from timing.rb:13

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

Mikel Lindsaar

9/9/2008 7:01:00 AM

On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard <ara.t.howard@gmail.com> wrote:
> On Sep 6, 2008, at 3:43 AM, Mikel Lindsaar wrote:
>> Hello all,
>> I am having some (un)fun with timing out a database calls.
> try this:
> <snip>
> pid = Process.pid
> signaler = IO.popen "ruby -e'sleep #{ seconds };
> Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
> thread = Thread.current
> handler = Signal.trap('TERM'){ thread.raise Error, seconds.to_s }
> begin
> block.call
> ensure
> Process.kill 'TERM', signaler.pid rescue nil
> Signal.trap('TERM', handler)
> end

Ara, thank you _so_ much for this.

I would never have thought of spawning suicidal terminator ruby
processes to nuke my process :) But works well.

There was a bit of delay (putting out some fires here over the past
two days) but I got to your code last night and this morning, and it
basically works... except it doesn't kill off the signaler threads
fully.

This is because two processes get made, first is the shell which then
creates the ruby -e "sleep..." blah thread.

The 'hack' I used to solve this is to replace the ensure block with:

ensure
Process.kill 'TERM', signaler.pid rescue nil
Process.kill('TERM', signaler.pid+1) rescue nil
Signal.trap('TERM', handler)
end

But this obviously is insane as it assumes that no other processes get
started on the computer between sh starting up and it firing off the
ruby process.

the ps output looks like this:

$ ps -ef | grep ruby
rails 2153 2152 69 17:04 /usr/sbin/ruby1.8 /usr/bin/rake update:all
rails 2237 2153 69 17:04 sh -c ruby -e'sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil'
rails 2238 2237 69 17:04 ruby -e'sleep 40.0;?
Process.kill(:TERM.to_s, 2153) rescue nil'

Any ideas on how to reliably find the PID of the ruby process that the
sh process created by IO.popen creates?

Mikel

--
http://lin...
Rails, RSpec and Life blog....

Martin DeMello

9/9/2008 8:48:00 AM

On Tue, Sep 9, 2008 at 12:00 AM, Mikel Lindsaar <raasdnil@gmail.com> wrote:
>
> Ara, thank you _so_ much for this.
>
> I would never have thought of spawning suicidal terminator ruby
> processes to nuke my process :) But works well.

I agree, that was very clever :) Bookmarked in case I ever need this.

martin

Michal Suchanek

9/9/2008 12:04:00 PM

On 09/09/2008, Mikel Lindsaar <raasdnil@gmail.com> wrote:
> On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard <ara.t.howard@gmail.com> wrote:
> > On Sep 6, 2008, at 3:43 AM, Mikel Lindsaar wrote:
> >> Hello all,
> >> I am having some (un)fun with timing out a database calls.
>
> > try this:
> > <snip>
>
> > pid = Process.pid
> > signaler = IO.popen "ruby -e'sleep #{ seconds };
> > Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
> > thread = Thread.current
> > handler = Signal.trap('TERM'){ thread.raise Error, seconds.to_s }
> > begin
> > block.call
> > ensure
> > Process.kill 'TERM', signaler.pid rescue nil
> > Signal.trap('TERM', handler)
> > end
>
>
> Ara, thank you _so_ much for this.
>
> I would never have thought of spawning suicidal terminator ruby
> processes to nuke my process :) But works well.
>
> There was a bit of delay (putting out some fires here over the past
> two days) but I got to your code last night and this morning, and it
> basically works... except it doesn't kill off the signaler threads
> fully.
>
> This is because two processes get made, first is the shell which then
> creates the ruby -e "sleep..." blah thread.
>
> The 'hack' I used to solve this is to replace the ensure block with:
>
>
> ensure
> Process.kill 'TERM', signaler.pid rescue nil
>
> Process.kill('TERM', signaler.pid+1) rescue nil
>
> Signal.trap('TERM', handler)
> end
>
>
> But this obviously is insane as it assumes that no other processes get
> started on the computer between sh starting up and it firing off the
> ruby process.
>
> the ps output looks like this:
>
> $ ps -ef | grep ruby
> rails 2153 2152 69 17:04 /usr/sbin/ruby1.8 /usr/bin/rake update:all
> rails 2237 2153 69 17:04 sh -c ruby -e'sleep 40.0;?
> Process.kill(:TERM.to_s, 2153) rescue nil'
> rails 2238 2237 69 17:04 ruby -e'sleep 40.0;?
> Process.kill(:TERM.to_s, 2153) rescue nil'
>
> Any ideas on how to reliably find the PID of the ruby process that the
> sh process created by IO.popen creates?
>

Since you are using popen anyway you can just have your ruby process
print it's PID when it starts, and read it in your terminator.

HTH

Michal

ara.t.howard

9/9/2008 2:38:00 PM

On Sep 9, 2008, at 6:10 AM, Michal Suchanek wrote:

> Since you are using popen anyway you can just have your ruby process
> print it's PID when it starts, and read it in your terminator.
>
> HTH

correct. this is basically how systemu does it, which you could use
similarly to this

require 'thread'

q = Queue.new

systemu command do |pid|

q.push pid

end

pid = q.pop

this bizzare syntax will capture the pid but *also* wait for the
process do start. all it's doing is reading from a pipe so your
solution seems fine.

cheers.

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

ara.t.howard

9/9/2008 2:39:00 PM

On Sep 9, 2008, at 1:07 AM, Mikel Lindsaar wrote:

> On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard
> <ara.t.howard@gmail.com> wrote:
>> On Sep 6, 2008, at 3:43 AM, Mikel Lindsaar wrote:
>>> Hello all,
>>> I am having some (un)fun with timing out a database calls.
>> try this:
>> <snip>
>> pid = Process.pid
>> signaler = IO.popen "ruby -e'sleep #{ seconds };
>> Process.kill(:TERM.to_s, #{ pid }) rescue nil'"
>> thread = Thread.current
>> handler = Signal.trap('TERM'){ thread.raise Error,
>> seconds.to_s }
>> begin
>> block.call
>> ensure
>> Process.kill 'TERM', signaler.pid rescue nil
>> Signal.trap('TERM', handler)
>> end
>
> Ara, thank you _so_ much for this.
>
> I would never have thought of spawning suicidal terminator ruby
> processes to nuke my process :) But works well.
>
>

i keep meaning to turn this into a library but have not. any other
advice - besides the pid issue - that you encountered trying to make
it live?

cheers.

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

Mikel Lindsaar

9/10/2008 2:42:00 AM

On Wed, Sep 10, 2008 at 12:45 AM, ara.t.howard <ara.t.howard@gmail.com> wrote:
> i keep meaning to turn this into a library but have not. any other advice -
> besides the pid issue - that you encountered trying to make it live?

No, the pid issue is the only thing... it sometimes misses.

A library hey?

gem install terminator

Terminate.timeout(40) do
... my block
end

:)

Mikel

--
http://lin...
Rails, RSpec and Life blog....

ara.t.howard

9/10/2008 2:53:00 AM

On Sep 9, 2008, at 8:48 PM, Mikel Lindsaar wrote:

> On Wed, Sep 10, 2008 at 12:45 AM, ara.t.howard
> <ara.t.howard@gmail.com> wrote:
>> i keep meaning to turn this into a library but have not. any other
>> advice -
>> besides the pid issue - that you encountered trying to make it live?
>
> No, the pid issue is the only thing... it sometimes misses.
>
> A library hey?
>
> gem install terminator
>
> Terminate.timeout(40) do
> ... my block
> end
>
> :)
>
> Mikel
>
> --
> http://lin...
> Rails, RSpec and Life blog....

oh that's good! i can give you commit rights to codeforpeople and we
could release. such a great name! ;-)

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama

Roger Pack

9/10/2008 6:56:00 PM

Mikel Lindsaar wrote:
> On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard <ara.t.howard@gmail.com>
> wrote:
>> begin
>> block.call
>> ensure
>> Process.kill 'TERM', signaler.pid rescue nil
>> Signal.trap('TERM', handler)
>> end
>
> Ara, thank you _so_ much for this.
>
> I would never have thought of spawning suicidal terminator ruby
> processes to nuke my process :) But works well.

There's also a timeout replacement lib [though I haven't tried it].
http://ph7spot.com/articles/sy...
--
Posted via http://www.ruby-....

Mikel Lindsaar

9/11/2008 12:49:00 AM

On Thu, Sep 11, 2008 at 4:55 AM, Roger Pack <rogerpack2005@gmail.com> wrote:
> Mikel Lindsaar wrote:
>> On Sun, Sep 7, 2008 at 12:42 AM, ara.t.howard <ara.t.howard@gmail.com>
>> Ara, thank you _so_ much for this.
> There's also a timeout replacement lib [though I haven't tried it].
> http://ph7spot.com/articles/sy...

Thanks for that, I had already tried it. This doesn't _always_ catch
timed out processes in my experience.

--
http://lin...
Rails, RSpec and Life blog....

comp.lang.ruby

Frustrated: System call timeouts

Mikel Lindsaar

ara.t.howard

Mikel Lindsaar

Martin DeMello

Michal Suchanek

ara.t.howard

ara.t.howard

Mikel Lindsaar

ara.t.howard

Roger Pack

Mikel Lindsaar

x Login to ForumsZone