Bill Kelly
7/20/2007 9:52:00 AM
From: "Greg Fodor" <gfodor@gmail.com>
> On Jul 20, 1:06 am, "Bill Kelly" <bi...@cts.com> wrote:
>> From: "Greg Fodor" <gfo...@gmail.com>
>>
>> > This helped us track down a nasty bug that was occuring due to the
>> > lack of a timeout in Net::HTTP during SSL connect, which still seems
>> > to be busted in ruby trunk.
>>
>> Could you describe this in more detail, or possibly post a diff
>> of the changes you made to fix the bug? (We're about to ship a
>> product with embedded ruby using Net::HTTP and SSL, so it would
>> be great to be able to eliminate any such lurking bug.)
>
> We didn't fix it directly, we worked around it by timing out all
> requests in the outer caller. The bug seems to be inside of def
> connect, there is a call to "s.connect" if ssl is enabled, and this
> call is not timed out. Some of our processes were hanging on this
> call.
Interesting. We've been seeing an issue with "s.connect" as well,
but only on Windows (ruby 1.8.4), and oddly only when ruby is
embedded into our C++ app, and only the *first* time the SSL
connect takes place.
For us, we'd see the CPU pegged for about 20 seconds down in
openssl.so -> ssleay.dll -> libeay.dll. But it would eventually
return. After that, all subsequent SSL connect calls would
execute quickly.
I was wondering if it was doing some one-time generation of a
private key or something. . . . (But why only when ruby was
embedded in our C++ app? Something missing from the environment,
I wondered...?)
Anyway, I wasn't getting very far debugging it as I didn't have
symbols for ruby or the ssl libraries. (I was using binaries
from the One-click installer.) So I built ruby 1.8.4 and
openssl locally with debug symbols, updating to a newer version
of OpenSSL in the process. (0.9.8e)
The result: The unexplained "s.connect" delay seems to have
vanished.
I would be happier if I knew what had been causing the problem;
maybe it's still lurking. But it used to happen like clockwork,
and since rebuilding ruby and a newer OpenSSL, I've yet to see
the problem again.
Incidentally our app also runs on OS X, and I have yet to see
this "s.connect" problem over there?
What platform(s) are you seeing it on? In your case, it sounded
like it may have been hanging indefinitely on you, as opposed to
being a ~20 second delay that would eventually return?
Regards,
Bill