Asp Forum - replacing the use of gettimeofday in the scheduler

Tomas Pospisek

3/1/2007 2:45:00 PM

using gettimeofday in the scheduler is problematic, since it's possible that
the system time will jump ahead or back because of the user or ntp resetting the
time [1]. This can have side effects such as sleep() and timeout() never
returning and thus threads not ever being scheduled again and seems to have
also other side effects [3].

Eric Hodel is arguing [2] that replacing the existing mechanism that uses
libc-select to sleep and getimeofday to calculate the effectively elapsed time
by libc-sleep is also problematic because:

"[for libc-sleep]... system activity may lengthen the sleep by an indeterminate
amount."

However, this applies in exactly the same way to libc-select as well and thus
replacing the select/gettimeofday mechanism by libc-sleep should at least work
no worse. Objections?

Has there been any effort to implement a solution based on sleep/usleep? Is the
interest to implement a more robust schedule timing mechanism? Is there a
chance for a patch based on sleep/usleep to make it into CVS?
*t

[1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...
[2] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...
[3] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

16 Answers

Avdi Grimm

3/1/2007 3:12:00 PM

On 3/1/07, Tomas Pospisek <tpo2@sourcepole.ch> wrote:
> However, this applies in exactly the same way to libc-select as well and thus
> replacing the select/gettimeofday mechanism by libc-sleep should at least work
> no worse. Objections?

My first reaction was: good god, the scheduler uses wallclock time?!
Speaking as someone who works on realtime systems (and thus has to
think about scheduler implementation often), this is never a good
idea. I don't know the background for Ruby's scheduler design, but
normally I'd regard a scheduler which uses wallclock time as just
plain *broken*. I'm heartily in favor of changing it to something
which isn't dependent on the clock.

That "indeterminate amount" referenced above is simply the price you
pay for running in userspace ion a modern multitasking OS. Yes,
system activity could delay the return. That's what multitasking
means: you don't get to choose when you get the CPU. In practice, if
applications are experiencing unacceptable latency in OS scheduling
then 1) your gettimeofday()-based implementation is going to be
delayed right along with everything else; and 2) you have bigger
problems, because your system is overloaded.

Cheers,

--
Avdi

Tomas Pospisek

3/1/2007 4:10:00 PM

Quoting Tomas Pospisek <tpo2@sourcepole.ch>:

> Has there been any effort to implement a [scheduling] solution based on
> sleep/usleep? Is the interest to implement a more robust schedule timing
> mechanism? Is there a chance for a patch based on sleep/usleep to make it
> into CVS?

gnu-libc's sleep(3) manpage suggest that sleep and SIGALRM on non-glibc systems
don't get along. From the POSIX spec [1]:

"If a SIGALRM signal is generated for the calling process during execution
of sleep(), except as a result of a prior call to alarm(), and if the
SIGALRM signal is not being ignored or blocked from delivery, it is
unspecified whether that signal has any effect other than causing sleep()
to return."

( Thus it is possible that Ruby's signalhandler for SIGALRM will *not* be
executed )

Since Ruby *does* allow the user to handle SIGALRM that would mean that an
implementation based on libc-sleep would fail to work correctly on the above
described systems, that don't handle SIGALRM together with sleep gracefully,
when the user is doing stuff with SIGALRM.

Does anybody know how relevant that is? I.e. does Ruby run at all on such
systems? The above would seem to exclude to implement scheduler waiting with
libc-sleep since that would prevent correct functioning of Ruby on such systems
in "corner cases".

?
*t

[1] http://www.opengroup.org/onlinepubs/009695399/functions/...

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Eric Hodel

3/1/2007 8:11:00 PM

On Mar 1, 2007, at 07:12, Avdi Grimm wrote:
> On 3/1/07, Tomas Pospisek <tpo2@sourcepole.ch> wrote:
>> However, this applies in exactly the same way to libc-select as
>> well and thus
>> replacing the select/gettimeofday mechanism by libc-sleep should
>> at least work
>> no worse. Objections?
>
> My first reaction was: good god, the scheduler uses wallclock time?!
> Speaking as someone who works on realtime systems (and thus has to
> think about scheduler implementation often), this is never a good
> idea. I don't know the background for Ruby's scheduler design, but
> normally I'd regard a scheduler which uses wallclock time as just
> plain *broken*. I'm heartily in favor of changing it to something
> which isn't dependent on the clock.

The Ruby thread scheduler uses setitimer(2) and select(2). It
depends on the wall-clock for implementing features defined in terms
of the wall-clock (Kernel#sleep and Thread#join).

> That "indeterminate amount" referenced above is simply the price you
> pay for running in userspace ion a modern multitasking OS. Yes,
> system activity could delay the return. That's what multitasking
> means: you don't get to choose when you get the CPU. In practice, if
> applications are experiencing unacceptable latency in OS scheduling
> then 1) your gettimeofday()-based implementation is going to be
> delayed right along with everything else; and 2) you have bigger
> problems, because your system is overloaded.

Kernel#sleep behaves differently in Ruby programs using threads. If
you sleep in a thread you end up context switching to other threads
instead of calling sleep(3).

Since you aren't using sleep(3) in threaded mode, Ruby instead uses
gettimeofday(2) to implement Kernel#sleep for the calling thread (has
this thread slept its N seconds?), so you may sleep longer than you
expect.

The other place gettimeofday(2) is used is Thread#join's timeout, for
similar reason.

Avdi Grimm

3/1/2007 8:16:00 PM

On 3/1/07, Eric Hodel <drbrain@segment7.net> wrote:
> The Ruby thread scheduler uses setitimer(2) and select(2). It
> depends on the wall-clock for implementing features defined in terms
> of the wall-clock (Kernel#sleep and Thread#join).

<snip>

Thanks for the explanation. I'm probably missing something, I'm
confused by why the functionality you describe in Kernel#sleep and
Thread#join can't be implemented using only select(). Can you clarify?

Thanks,

--
Avdi

Tomas Pospisek

3/1/2007 9:16:00 PM

Tomas Pospisek

3/1/2007 9:19:00 PM

Tomas Pospisek

3/1/2007 9:30:00 PM

MenTaLguY

3/1/2007 9:49:00 PM

On Fri, 2 Mar 2007 06:30:12 +0900, "Tomas Pospisek's Mailing Lists" <tpo2@sourcepole.ch> wrote:

> What do you mean by "defined in terms of the wall-clock"?

"wall-clock" refers to real elapsed time, rather than CPU elapsed time. It's better to base your scheduler on CPU elapsed time, since on a heavily loaded system, a "wall-clock"-based scheduler will just thrash without getting much useful work done.

Since there aren't widespread standard APIs for CPU-time-based interrupts, most runtimes with "green thread" schedulers that are based on CPU time approximate it by counting reductions, VM instructions, or AST nodes traversed.

-mental

Eric Hodel

3/1/2007 10:20:00 PM

On Mar 1, 2007, at 13:30, Tomas Pospisek's Mailing Lists wrote:
> On Fri, 2 Mar 2007, Eric Hodel wrote:
>> The Ruby thread scheduler uses setitimer(2) and select(2). It
>> depends on the wall-clock for implementing features defined in
>> terms of the wall-clock (Kernel#sleep and Thread#join).
>
> You need to add Timeout#timeout to this.

Nope. Timeout calls Kernel#sleep in a thread.

> But:
>
> $ ri Kernel#sleep
>
> Suspends the current thread for _duration_ seconds (which may be
> any number, including a +Float+ with fractional seconds). Returns
> the actual number of seconds slept (rounded), which may be less
> than that asked for if another thread calls +Thread#run+. Zero
> arguments causes +sleep+ to sleep forever.
>
> No reference to wall-clock in there. What do you mean by "defined
> in terms of the wall-clock"?

When I write "sleep 5" I expect at least five seconds on the clock on
my wall to go by before the next statement is executed.

Eric Hodel

3/1/2007 10:27:00 PM

On Mar 1, 2007, at 13:30, Tomas Pospisek's Mailing Lists wrote:

> The problem is that when you set system time into the past by a
> month, then your thread will also sleep for a month and not, as you
> probably expected, only a few seconds. Which is actually the hint
> for the solution... to be followed.

I don't see how this could be confused for a bug in Ruby.

comp.lang.ruby

replacing the use of gettimeofday in the scheduler

Tomas Pospisek

Avdi Grimm

Tomas Pospisek

Eric Hodel

Avdi Grimm

Tomas Pospisek

Tomas Pospisek

Tomas Pospisek

MenTaLguY

Eric Hodel

Eric Hodel

x Login to ForumsZone