Asp Forum - Thinking about Threaded IO

James Gray

10/3/2004 8:37:00 PM

I've not used Ruby's threads before, so I have what will probably be
some basic questions. I'm pretty familiar with using native thread
systems, but these "in processor" threads raise some questions for me.

I'm especially wondering about IO. I've read that Ruby's threads can
be dangerous when making possibly lengthy calls to the operating
system. How does that affect threaded servers in Ruby? If you call
gets() on a socket, does the program hang until that socket produces
input containing a \n character? If so, what's the best solution? Use
non-blocking IO techniques?

I guess that's a pretty specific example and I am interested in the
affects of this type of threading on networking code, but let me ask
something more general. When should I be stopping to worry if this
action I'm threading will stall the whole program? Where is this
generally a problem?

Thanks.

James Edward Gray II

12 Answers

David G. Andersen

10/3/2004 10:42:00 PM

On Mon, Oct 04, 2004 at 05:37:20AM +0900, James Edward Gray II scribed:
> I've not used Ruby's threads before, so I have what will probably be
> some basic questions. I'm pretty familiar with using native thread
> systems, but these "in processor" threads raise some questions for me.
>
> I'm especially wondering about IO. I've read that Ruby's threads can
> be dangerous when making possibly lengthy calls to the operating
> system. How does that affect threaded servers in Ruby? If you call
> gets() on a socket, does the program hang until that socket produces
> input containing a \n character? If so, what's the best solution? Use
> non-blocking IO techniques?
>
> I guess that's a pretty specific example and I am interested in the
> affects of this type of threading on networking code, but let me ask
> something more general. When should I be stopping to worry if this
> action I'm threading will stall the whole program? Where is this
> generally a problem?

Ruby's threads seem "generally pretty good" about not blocking
on IO calls _if_ you do them right. An example from a program
I was working on:

mybuf.sbuf += s.sysread(65536)
vs
mybuf.sbuf += s.sysread(4096)

The former caused (all of) Ruby to block. The latter
was handled properly. I _assume_, but didn't verify,
that this is because Ruby was doing something like

select()
read(foo)

internally, and the read with the huge blocksize scrogged
things by blocking anyway. But I was being stupidly lazy
trying to sysread such a large blocksize anyway ...
is/was this a Ruby bug? Perhaps. But easily worked around.

In _general_, you shouldn't have to use nonblocking IO,
but there are likely operations that Ruby can't make internally
nonblocking. DNS lookups are often a pain to perform
asynchronously unless you explicitly use an async
DNS library, for instance. File operations over
NFS can block and are next to impossible to
deal with without either multiple processes or
kernel-level multithreading (particularly metadata
operations like lookup and open).

-Dave

--
work: dga@lcs.mit.edu me: dga@pobox.com
MIT Laboratory for Computer Science http://www....

Brian Candler

10/4/2004 8:39:00 AM

On Mon, Oct 04, 2004 at 07:42:14AM +0900, David G. Andersen wrote:
> Ruby's threads seem "generally pretty good" about not blocking
> on IO calls _if_ you do them right. An example from a program
> I was working on:
>
> mybuf.sbuf += s.sysread(65536)
> vs
> mybuf.sbuf += s.sysread(4096)
>
> The former caused (all of) Ruby to block. The latter
> was handled properly. I _assume_, but didn't verify,
> that this is because Ruby was doing something like
>
> select()
> read(foo)
>
> internally

Essentially that's right. Was there any reason to use 'sysread' rather than
'read'? I think that

mybuf.sbuf += s.read(65536)

probably would have worked as you'd expected.

> In _general_, you shouldn't have to use nonblocking IO,
> but there are likely operations that Ruby can't make internally
> nonblocking. DNS lookups are often a pain to perform
> asynchronously unless you explicitly use an async
> DNS library, for instance. File operations over
> NFS can block and are next to impossible to
> deal with without either multiple processes or
> kernel-level multithreading (particularly metadata
> operations like lookup and open).

All good points. I'd just add that things like external database libraries
(e.g. mysql) tend to block too. I've seen some which don't; the Oracle OCI8
binding for ruby has the ability to be put in a 'nonblocking' mode, but what
it actually does is poll for a result after 1ms, 2ms, 4ms, 8ms...etc !

For such applications, separate processes are often essential. For web
applications I've had a lot of success with fcgi, where a pool of persistent
processes is set up by Apache under mod_fastcgi, and each one only handles a
single request at a time. This means you don't have to worry about thread
safety as well as blocking.

Regards,

Brian.

James Gray

10/4/2004 4:11:00 PM

On Oct 4, 2004, at 3:39 AM, Brian Candler wrote:

> On Mon, Oct 04, 2004 at 07:42:14AM +0900, David G. Andersen wrote:
>>
>> The former caused (all of) Ruby to block. The latter
>> was handled properly. I _assume_, but didn't verify,
>> that this is because Ruby was doing something like
>>
>> select()
>> read(foo)
>>
>> internally
>
> Essentially that's right. Was there any reason to use 'sysread' rather
> than
> 'read'? I think that
>
> mybuf.sbuf += s.read(65536)
>
> probably would have worked as you'd expected.

So with a big read, you can still hang waiting for the bytes? Are you
suggesting read()) would have handled this better than sysread()?
Where does that leave gets()?

Thanks.

James Edward Gray II

Brian Candler

10/4/2004 4:25:00 PM

On Tue, Oct 05, 2004 at 01:10:48AM +0900, James Edward Gray II wrote:
> >'read'? I think that
> >
> > mybuf.sbuf += s.read(65536)
> >
> >probably would have worked as you'd expected.
>
> So with a big read, you can still hang waiting for the bytes? Are you
> suggesting read()) would have handled this better than sysread()?
> Where does that leave gets()?

IO#read and IO#gets work properly; in other words Ruby wraps the calls
appropriately to make sure they never block the interpreter engine.

If you use sysread then you're telling Ruby to bypass what it knows, and
just call the underlying O/S function directly. In that case, you should
know what you are doing before you ask for it!

Checking with the source, IO#sysread checks the FD is ready (essentially
using select()) and then does a single read() operation of the size
requested:

n = fileno(fptr->f);
rb_thread_wait_fd(fileno(fptr->f));
TRAP_BEG;
n = read(fileno(fptr->f), RSTRING(str)->ptr, RSTRING(str)->len);
TRAP_END;

whereas IO#read goes via rb_io_fread, which reads only as much data is
available at a time, appending it to a string. IO#gets goes via appendline
which also checks how much data is available before reading it.

Regards,

Brian.

James Gray

10/4/2004 4:38:00 PM

On Oct 4, 2004, at 11:25 AM, Brian Candler wrote:

> IO#read and IO#gets work properly; in other words Ruby wraps the calls
> appropriately to make sure they never block the interpreter engine.

Thank you for the excellent information. I must say that this makes me
fear Ruby's threads a lot less. They seem extremely well thought out.

James Edward Gray II

David G. Andersen

10/5/2004 7:35:00 PM

On Mon, Oct 04, 2004 at 09:38:57AM +0100, Brian Candler scribed:
> >
> > The former caused (all of) Ruby to block. The latter
> > was handled properly. I _assume_, but didn't verify,
> > that this is because Ruby was doing something like
> >
> > select()
> > read(foo)
> >
> > internally
>
> Essentially that's right. Was there any reason to use 'sysread' rather than
> 'read'? I think that
>
> mybuf.sbuf += s.read(65536)
>
> probably would have worked as you'd expected.

Think so too. I don't remember why I switched it to sysread -
I think I was having problems telling ruby to read as much
as possible from the file descriptor without blocking, and I
didn't want to have to make it explicitly nonblocking and
abandon the happiness of threads.

-dave

--
work: dga@lcs.mit.edu me: dga@pobox.com
MIT Laboratory for Computer Science http://www....

Kevin McConnell

10/6/2004 2:57:00 PM

Brian Candler wrote:

> IO#read and IO#gets work properly; in other words Ruby wraps the calls
> appropriately to make sure they never block the interpreter engine.

Not sure if any of you use Windows, but it's probably worth pointing out
that on Windows IO#gets *does* block all threads.

E.g. if I run the following:

thread = Thread.new { i=0; while(true); puts i+=1; sleep 1; end }
sleep 10
str = $stdin.gets
sleep 10

It should display the numbers 1 to 10, and then go suspiciously quiet
until you provide the input to gets. Once you do that, you'll get the
numbers 11 to 20.

Cheers,
Kevin

James Gray

10/6/2004 3:11:00 PM

On Oct 6, 2004, at 9:59 AM, Kevin McConnell wrote:

> Brian Candler wrote:
>
>> IO#read and IO#gets work properly; in other words Ruby wraps the calls
>> appropriately to make sure they never block the interpreter engine.
>
> Not sure if any of you use Windows, but it's probably worth pointing
> out that on Windows IO#gets *does* block all threads.
>
> E.g. if I run the following:
>
> thread = Thread.new { i=0; while(true); puts i+=1; sleep 1; end }
> sleep 10
> str = $stdin.gets
> sleep 10
>
> It should display the numbers 1 to 10, and then go suspiciously quiet
> until you provide the input to gets. Once you do that, you'll get the
> numbers 11 to 20.

It's that because you are calling gets() on STDIN here? Would it
behave the same if we were dealing with sockets instead?

James Edward Gray II

Robert Klemme

10/6/2004 3:14:00 PM

"Kevin McConnell" <kevin_mcconnell@hotmail.com> schrieb im Newsbeitrag
news:10m81t9hldp0s2f@corp.supernews.com...
> Brian Candler wrote:
>
> > IO#read and IO#gets work properly; in other words Ruby wraps the calls
> > appropriately to make sure they never block the interpreter engine.
>
> Not sure if any of you use Windows, but it's probably worth pointing out
> that on Windows IO#gets *does* block all threads.
>
> E.g. if I run the following:
>
> thread = Thread.new { i=0; while(true); puts i+=1; sleep 1; end }
> sleep 10
> str = $stdin.gets
> sleep 10
>
> It should display the numbers 1 to 10, and then go suspiciously quiet
> until you provide the input to gets. Once you do that, you'll get the
> numbers 11 to 20.

That's not true for the cygwin build:

17:12:51 [ruby]: uname -a
CYGWIN_NT-5.0 bond 1.5.10(0.116/4/2) 2004-05-25 22:07 i686 unknown unknown
Cygwin
17:12:54 [ruby]: ruby -v
ruby 1.8.1 (2003-12-25) [i386-cygwin]
17:12:58 [ruby]: cat ioblock.rb
thread = Thread.new { i=0; while(true); puts i+=1; sleep 1; end }
sleep 10
print "prompt> "
print "got: ", $stdin.gets, "\n"
sleep 10
17:13:01 [ruby]: ruby ioblock.rb
1
2
3
4
5
6
7
8
9
10
prompt> 11
12
13
foo
got: foo

14
15
16
ioblock.rb:5:in `sleep': Interrupt from ioblock.rb:5

Kind regards

robert

Kevin McConnell

10/6/2004 4:36:00 PM

James Edward Gray II wrote:

> It's that because you are calling gets() on STDIN here? Would it behave
> the same if we were dealing with sockets instead?

I haven't had time to check that, but I'll try to do so later. (I'd
guess it still blocks, but that's based more on cynicism than anything
else :-)

Robert Klemme wrote:

> That's not true for the cygwin build:

Sorry, I should have pointed that out. I know it works OK on a cygwin
build, just not on a visual studio build following the instructions in
the win32 directory.

Sorry for any confusion.

Kevin

comp.lang.ruby

Thinking about Threaded IO

James Gray

David G. Andersen

Brian Candler

James Gray

Brian Candler

James Gray

David G. Andersen

Kevin McConnell

James Gray

Robert Klemme

Kevin McConnell

x Login to ForumsZone