Asp Forum - safety of timeout

leon breedt

9/27/2004 10:25:00 PM

hi,

i've seen discussion previously regarding the safety of timeout,
however, as i understand it, this usage is safe (@client is connected
socket)

begin
timeout(30) do
line = @client.gets
end
rescue Timeout::Error
@client.close
end

am i wrong? all the real work is done elsewhere once complete requests
have been received.

leon

11 Answers

Yukihiro Matsumoto

9/27/2004 11:53:00 PM

Hi,

In message "Re: safety of timeout()"
on Tue, 28 Sep 2004 07:24:48 +0900, leon breedt <bitserf@gmail.com> writes:

|i've seen discussion previously regarding the safety of timeout,
|however, as i understand it, this usage is safe (@client is connected
|socket)

Define 'safety' first. Interpreter should not dump core by this
usage, but no exact 30 seconds guarantee.

matz.

leon breedt

9/28/2004 1:42:00 AM

On Tue, 28 Sep 2004 08:52:33 +0900, Yukihiro Matsumoto
<matz@ruby-lang.org> wrote:
> Define 'safety' first. Interpreter should not dump core by this
> usage, but no exact 30 seconds guarantee.
i'm not too worried about the precision, more just concerned about
cases that can interrupt the timeout.

like when a signal is received from OS while inside blocking code
wrapped by timeout().

how does the scope of trap() work for these situations? does trap()
work at the level of the real process? or current virtual thread?

Yukihiro Matsumoto

9/28/2004 3:04:00 AM

Hi,

In message "Re: safety of timeout()"
on Tue, 28 Sep 2004 10:41:57 +0900, leon breedt <bitserf@gmail.com> writes:

|like when a signal is received from OS while inside blocking code
|wrapped by timeout().

Signal is received immediately (at the latest safe point), and is
delivered to the main thread.

|how does the scope of trap() work for these situations? does trap()
|work at the level of the real process? or current virtual thread?

As stated above, trap works in the main thread.

matz.

Brian Candler

9/28/2004 11:47:00 AM

On Tue, Sep 28, 2004 at 07:24:48AM +0900, leon breedt wrote:
> i've seen discussion previously regarding the safety of timeout,
> however, as i understand it, this usage is safe (@client is connected
> socket)
>
> begin
> timeout(30) do
> line = @client.gets
> end
> rescue Timeout::Error
> @client.close
> end
>
> am i wrong? all the real work is done elsewhere once complete requests
> have been received.

That's true. The problem is that the timeout thread raises an exception
asynchronously. In the above case your main work is just @client.gets, but
if it were doing something more important, that work could be interrupted.
In particular, even work within an 'ensure' block is interrupted. It's a
common pattern to use 'ensure' to do cleanup work, but if the timeout occurs
at just the wrong time, the cleanup may not be completed.

This program demonstrates it:

---- 8< ------------------------
require 'timeout'
def bar
sleep(4)
raise "wibble" # optional
end

def foo
bar
ensure
puts "Cleanup started..."
sleep(2)
puts "Cleanup finished"
end

begin
timeout(5) do
foo
end
rescue Exception => e
p e
end
---- 8< ------------------------

You can try it both with and without the 'raise "wibble"' line. In both
cases the cleanup code in the 'ensure' block does not complete.

As a more realistic example, imagine some code like this:

timeout(30) do
File.open("mylog","a") do |f|
... do stuff
end
end

File.open with a block does several things:
1. open the file
2. yield it to the block
3. close the file in an 'ensure' section

If you were really unlucky, and the timeout occurred at exactly the same
time as the 'ensure' section were being executed, then the file could remain
open.

At least, that's what I understand to be the crux of the issue. In many
cases it's not going to be a major concern. You may be able to rewrite the
code to make it safer by pushing the timeouts down to the lowest possible
level. The above example could be rewritten safely as:

File.open("mylog","a") do |f|
timeout(30) do
... do stuff
end
end

Regards,

Brian.

Paul Brannan

9/28/2004 2:10:00 PM

On Tue, Sep 28, 2004 at 08:47:29PM +0900, Brian Candler wrote:
> At least, that's what I understand to be the crux of the issue. In many
> cases it's not going to be a major concern. You may be able to rewrite the
> code to make it safer by pushing the timeouts down to the lowest possible
> level. The above example could be rewritten safely as:
>
> File.open("mylog","a") do |f|
> timeout(30) do
> ... do stuff
> end
> end

I like your description of the problem, Brian. I've tried to explain
this before and not been able to articulate it quite so well.

I would like to add one additional problem to your explanation, though:
timeout exceptions can prevent you from knowing whether or not an
operation actually succeeded. We have this problem with CORBA timeouts
in particular; a remote call is made, and the process on the other side
is running slowly. It does, however, complete the operation, just as
the timeout exception is being fired. So do I treat the operation as
success (since I know the request reached the other side, as I didn't
get a communications failure exception), or do I treat the operation as
failure (since I did get an exception and I did fail to get a return
value from the call)?

A solution I've used in the past has been to use an event loop and let
it handle the timeouts. The timeout then occurs only when the event
loop has control; when the timeout does occur, a proc is called that
handles the timeout. This proc may raise an exception or it may take
some other action.

For long-running operations, I periodically yield control to the event
loop when it is safe.

Paul

Brian Candler

9/28/2004 2:58:00 PM

> I would like to add one additional problem to your explanation, though:
> timeout exceptions can prevent you from knowing whether or not an
> operation actually succeeded. We have this problem with CORBA timeouts
> in particular; a remote call is made, and the process on the other side
> is running slowly. It does, however, complete the operation, just as
> the timeout exception is being fired. So do I treat the operation as
> success (since I know the request reached the other side, as I didn't
> get a communications failure exception), or do I treat the operation as
> failure (since I did get an exception and I did fail to get a return
> value from the call)?

I think that's a broader problem, and not specific to Ruby timeouts.

In the simplest case it's a pure race: with a 30 second timeout, what
happens if you get a response after 29.99 seconds or 30.01 seconds? In my
opinion, the borderline case doesn't really matter; you can assert that if a
Timeout::Error is fired, then you did not get a response in time. If the
server subsequently responds after 30.01 or 40 or 50 seconds, then tough; it
was too late, by definition.

However, I guess what you're really worried about is something more
fundamental: did the command actually complete on the CORBA server? Did the
server change state? Should I resubmit the command later?

You can see that this cannot be handled by timeouts alone. For example:

(1) the command might have completed after 27 seconds, but due to network
congestion, the reply did not get back until after 32 seconds. (=> the
command completed in time, but you were unable to detect this)

(2) the command might continue to execute after your timeout exception
fires, and complete after say 31 seconds. Even if the timeout exception then
goes on to drop the CORBA connection or try to chase the command with an
"abort" message of some sort, it's still a race which might be lost.

In order to be able to tell with certainty whether your command was accepted
AND acted upon, I believe you really need to use a sequence-number type of
mechanism, where both ends keep track of which messages they have sent and
have been acknowledged by the other side.

---> submit command N
... timeout
<-- response N "1234" (ignored by client, it was too late)

---> resubmit command N
<-- response N "1234" (from cache)

---> submit command N+1
<-- response N+1 "9876" etc.

Each end needs to keep track of the last command or response sent, so that
it can be resent if necessary. At the server side, if command N is received
a second time, the previous (remembered) response is resent; that's because
the command already executed the first time and changed the system state, so
attempting to perform the command again could fail. The client sending
command N+1 is an implicit acknowledgement that the response from command N
has been received, and no longer needs to be remembered.

Unfortunately, building a protocol like this *properly* is difficult, and if
done right you will end up with something which looks very much like TCP or
the X25 link layer. You need procedures to initialise the sequence numbers
and reset them in the case of gross errors, such as one end or the other
forgetting its sequence number. Ideally the sequence numbers and message
buffers should be persistent across application restarts (i.e. they are
stored in a database). Each "logical connection" between two endpoints needs
to be distinct with its own sequence number set. And you will need to choose
appropriate retransmission parameters.

This is a common problem though, and I'd certainly like to see a generic
encapsulation protocol which handles it properly. I think if done right, it
would work over multiple transport layers (e.g. HTTP POST, or even exchanges
of E-mail messages). If anyone knows of such a thing, I'd love to see it.

Regards,

Brian.

Ara.T.Howard

9/28/2004 4:01:00 PM

Paul Brannan

9/28/2004 4:38:00 PM

On Tue, Sep 28, 2004 at 03:57:58PM +0100, Brian Candler wrote:
> In order to be able to tell with certainty whether your command was
> accepted AND acted upon, I believe you really need to use a
> sequence-number type of mechanism, where both ends keep track of which
> messages they have sent and have been acknowledged by the other side.

I don't think sequencing messages is sufficient to solve the problem.
A protocol like what you describe provides reliable messaging, but
not much more. For example, suppose I want to fail over to the backup
system if I time out -- I can do this, but I run the risk of performing
the operation more than once. At that point it becomes a question of
policy (can I afford to take that risk, or is that risk truly
necessary?).

If there were any easy solutions, then a lot of real-time researchers
would be out of work.

Paul

leon breedt

9/28/2004 8:31:00 PM

Hi,

Thanks for the detailed elaborations, folks. Good to know I'm not
unique in finding this non-trivial to do as correctly as possible :)

On Wed, 29 Sep 2004 01:37:43 +0900, Paul Brannan <pbrannan@atdesk.com> wrote:
> I don't think sequencing messages is sufficient to solve the problem.
> A protocol like what you describe provides reliable messaging, but
> not much more. For example, suppose I want to fail over to the backup
> system if I time out -- I can do this, but I run the risk of performing
> the operation more than once. At that point it becomes a question of
> policy (can I afford to take that risk, or is that risk truly
> necessary?).
In my case, I'm lucky enough that each operation requires only one
message from the client to my server, so it becomes a matter of being
able to safely determine the identity of the request so that a
subsequent request with the same identity would be discarded.

In my case, both the primary and backup system would use the same
RDBMS data source to keep track of what's been processed.

This server exists purely to prevent the problem of clients
accidentally submitting the same request twice, in the realm of credit
card payments.

Determining the identity correctly to allow valid second attempts
through is interesting. At the moment, I use serial numbers as well,
but I'm not entirely happy with this, as there was no negotiation
process to obtain these.

I also provide the guarantee to the client app that as soon as I've
acknowledged a request, its been persisted, and the server will
attempt to process it until it gets a deterministic OK/FAILED result.

So if the client times out before receiving my acknowledgement, and
they resubmit, they'll receive the in-progress error, and can send a
query to determine the status.

Leon

Guillaume Marcais

9/28/2004 10:46:00 PM

On Tue, 2004-09-28 at 12:14, Ara.T.Howard@noaa.gov wrote:

> - http://www.s...
> - http://raa.ruby-lang.org/project/...
>
> i have a patched version of the latest ruby binding.

What does your patch fix/improve?

Guillaume.

comp.lang.ruby

safety of timeout

leon breedt

Yukihiro Matsumoto

leon breedt

Yukihiro Matsumoto

Brian Candler

Paul Brannan

Brian Candler

Ara.T.Howard

Paul Brannan

leon breedt

Guillaume Marcais

x Login to ForumsZone