[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

cURL in ruby? Faster than Net::HTTP?

Ben Johnson

8/23/2006 3:18:00 AM

I've found a couple of packages that claim to integrate the curl library
into ruby. Which one is the standard library?

Also the reason I am asking is because I did some tests and came to find
out that curl is quite a bit faster than the HTTP library. Is this true,
maybe my tests were distorted, but curl seemed to be quite a bit faster
in initializing the connection and downloading.

Would it be smart of me to switch from Net::HTTP to curl? Because a
tenth of a second is precious in my application.

Thanks for your help.

--
Posted via http://www.ruby-....

17 Answers

Corey Jewett

8/23/2006 3:51:00 AM

0

I can't speak to the speed of any curl library, but I can cite my
recent experience building a crawler like app. I'm using non-blocking
sockets and therefore can't utilize Net::HTTP and am hand-coding HTTP
directly. Under OS X I found a lot of latency (around 100ms) for both
IPSocket.getaddress() and Socket.sockaddr_in(). Under Linux packing
sockaddr seems to incur a negligible cost. Under OS X I pack the
sockaddr manually (yeah, it's gross). To mitigate the host lookup
cost I maintain a cache (Hashmap) of host => IP. (At least under OS
X, even resolving localhost takes 100ms, even on repeat calls.)

The point being that I would assume that Net::HTTP inherits the costs
of these two calls. Which would explain at least some of the
connection slowness. As for download speed, I could only make
guesses, and they'd be pretty uneducated. I would suspect C has
better I/O performance than Ruby, so a native library would probably
be faster.

Corey


On Aug 22, 2006, at 8:17 PM, Ben Johnson wrote:

> I've found a couple of packages that claim to integrate the curl
> library
> into ruby. Which one is the standard library?
>
> Also the reason I am asking is because I did some tests and came to
> find
> out that curl is quite a bit faster than the HTTP library. Is this
> true,
> maybe my tests were distorted, but curl seemed to be quite a bit
> faster
> in initializing the connection and downloading.
>
> Would it be smart of me to switch from Net::HTTP to curl? Because a
> tenth of a second is precious in my application.
>
> Thanks for your help.
>
> --
> Posted via http://www.ruby-....
>


Ezra Zygmuntowicz

8/23/2006 4:01:00 AM

0


On Aug 22, 2006, at 8:17 PM, Ben Johnson wrote:

> I've found a couple of packages that claim to integrate the curl
> library
> into ruby. Which one is the standard library?
>
> Also the reason I am asking is because I did some tests and came to
> find
> out that curl is quite a bit faster than the HTTP library. Is this
> true,
> maybe my tests were distorted, but curl seemed to be quite a bit
> faster
> in initializing the connection and downloading.
>
> Would it be smart of me to switch from Net::HTTP to curl? Because a
> tenth of a second is precious in my application.
>
> Thanks for your help.
>
> --
> Posted via http://www.ruby-....
>

Hey Ben-

I haven't used the libcurl bindings myself so I can't comment on
those. But you may want to look at Zed's rfuzz project[1]. It is for
testing web apps but he also says that it is a faster replacement for
net/http. Since its http parser is written in c using the same parser
that mongrel does it should be faster then net/http.

Cheers-
-Ezra

[1] http://www.zedshaw.com/proje...

why the lucky stiff

8/23/2006 4:13:00 AM

0

On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:
> Also the reason I am asking is because I did some tests and came to find
> out that curl is quite a bit faster than the HTTP library. Is this true,
> maybe my tests were distorted, but curl seemed to be quite a bit faster
> in initializing the connection and downloading.

The cURL library is indeed very fast, but it also suffers from a problem that
Net::HTTP suffers from: its DNS lookup is not asynchronous and will block your
process. To overcome that, you'll need c-ares[1], which will probably also need
to be wrapped as an extension.

In my experience, Net::HTTP actually performs much better when you use Ruby's
non-blocking DNS resolver:

require 'resolv-replace'

I wrote a cURL extension and benchmarked it against Net::HTTP with
resolv-replace and wasn't completely impressed with the speed difference,
so I abandoned the extension.

_why

[1] http://daniel.haxx.se/projec...

Ben Johnson

8/23/2006 4:21:00 AM

0

why the lucky stiff wrote:
> On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:
>> Also the reason I am asking is because I did some tests and came to find
>> out that curl is quite a bit faster than the HTTP library. Is this true,
>> maybe my tests were distorted, but curl seemed to be quite a bit faster
>> in initializing the connection and downloading.
>
> The cURL library is indeed very fast, but it also suffers from a problem
> that
> Net::HTTP suffers from: its DNS lookup is not asynchronous and will
> block your
> process. To overcome that, you'll need c-ares[1], which will probably
> also need
> to be wrapped as an extension.
>
> In my experience, Net::HTTP actually performs much better when you use
> Ruby's
> non-blocking DNS resolver:
>
> require 'resolv-replace'
>
> I wrote a cURL extension and benchmarked it against Net::HTTP with
> resolv-replace and wasn't completely impressed with the speed
> difference,
> so I abandoned the extension.
>
> _why
>
> [1] http://daniel.haxx.se/projec...

What do you mean by the DNY lookup is asynchronous and will block my
process? If I was to call curl directly from the command line using
`curl` in ruby wouldn't that be much faster. In this instance it wo uld
get it's own process and take better advantage of a dual processor
system. Am I correct, because what I planned on doing was just using
curl directly from the command line unless there is a downside to this.

--
Posted via http://www.ruby-....

snacktime

8/23/2006 4:37:00 AM

0

>
> What do you mean by the DNY lookup is asynchronous and will block my
> process? If I was to call curl directly from the command line using
> `curl` in ruby wouldn't that be much faster. In this instance it wo uld
> get it's own process and take better advantage of a dual processor
> system. Am I correct, because what I planned on doing was just using
> curl directly from the command line unless there is a downside to this.
>

From my understanding dns lookups block in ruby, as in they stop the
whole program until the dns is resolved. I can't imagine that forking
another process would be more efficient then using net/http.

Corey Jewett

8/23/2006 4:41:00 AM

0


On Aug 22, 2006, at 9:21 PM, Ben Johnson wrote:

> why the lucky stiff wrote:
>> On Wed, Aug 23, 2006 at 12:17:42PM +0900, Ben Johnson wrote:
>>> Also the reason I am asking is because I did some tests and came
>>> to find
>>> out that curl is quite a bit faster than the HTTP library. Is
>>> this true,
>>> maybe my tests were distorted, but curl seemed to be quite a bit
>>> faster
>>> in initializing the connection and downloading.
>>
>> The cURL library is indeed very fast, but it also suffers from a
>> problem
>> that
>> Net::HTTP suffers from: its DNS lookup is not asynchronous and will
>> block your
>> process. To overcome that, you'll need c-ares[1], which will
>> probably
>> also need
>> to be wrapped as an extension.
>>
>> In my experience, Net::HTTP actually performs much better when you
>> use
>> Ruby's
>> non-blocking DNS resolver:
>>
>> require 'resolv-replace'
>>
>> I wrote a cURL extension and benchmarked it against Net::HTTP with
>> resolv-replace and wasn't completely impressed with the speed
>> difference,
>> so I abandoned the extension.
>>
>> _why
>>
>> [1] http://daniel.haxx.se/projec...
>
> What do you mean by the DNY lookup is asynchronous and will block my
> process? If I was to call curl directly from the command line using
> `curl` in ruby wouldn't that be much faster. In this instance it wo
> uld
> get it's own process and take better advantage of a dual processor
> system. Am I correct, because what I planned on doing was just using
> curl directly from the command line unless there is a downside to
> this.

No Kernel.`` doesn't fork a new process. It blocks your current
process and waits for the subprocess to return. See Kernel.fork and
Process.detach.

Also there's some gems that could probably help you out. Ara T.
Howard's slave[1] comes to mind.

Corey

1. http://codeforpeople.com/lib/r...

Ben Johnson

8/23/2006 6:32:00 AM

0

snacktime wrote:
>>
>> What do you mean by the DNY lookup is asynchronous and will block my
>> process? If I was to call curl directly from the command line using
>> `curl` in ruby wouldn't that be much faster. In this instance it wo uld
>> get it's own process and take better advantage of a dual processor
>> system. Am I correct, because what I planned on doing was just using
>> curl directly from the command line unless there is a downside to this.
>>
>
> From my understanding dns lookups block in ruby, as in they stop the
> whole program until the dns is resolved. I can't imagine that forking
> another process would be more efficient then using net/http.

In my program each curl request would be in its own thread. I also think
the forking a new process by using `` would be quicker. Mainly because I
am doing this on a dual processor server. Having everything run under
one process doesn't take advantage of that. Lastly, curl has a timeout
variable, so if for some reason the request didn't response it would
time out. I also noticed that running curl and Net::HTTP side by side,
curl wins hands down. There is even a hitch right before the request is
made in Net::HTTP, about .5 to 1 second.

Am I wrong here?

What I'm going to do is probably implement the curl functionaltiy in my
program and post the speed differences for future reference. Unless
someone tells me I'm about going about this all wrong.

Thanks a lot for everyones help.

--
Posted via http://www.ruby-....

daniel.haxx

8/24/2006 7:14:00 AM

0

why the lucky stiff wrote:

> The cURL library is indeed very fast, but it also suffers from a problem that
> Net::HTTP suffers from: its DNS lookup is not asynchronous and will block your
> process.

libcurl offers an asynchronous API that does the name resolving
asynchronously if you've built libcurl to do so.

why the lucky stiff

8/24/2006 6:55:00 PM

0

On Thu, Aug 24, 2006 at 04:15:02PM +0900, daniel.haxx@gmail.com wrote:
> libcurl offers an asynchronous API that does the name resolving
> asynchronously if you've built libcurl to do so.

Does it use the native getaddrinfo()? The problem I've had on FreeBSD
is that getaddrinfo() will block.

_why

David Vallner

8/24/2006 7:06:00 PM

0

Ben Johnson wrote:
> What do you mean by the DNY lookup is asynchronous and will block my
> process? If I was to call curl directly from the command line using
> `curl` in ruby wouldn't that be much faster. In this instance it wo uld
> get it's own process and take better advantage of a dual processor
> system. Am I correct, because what I planned on doing was just using
> curl directly from the command line unless there is a downside to this.

Odds are the process startup would take up more time than you'd gain.
IMO that's NOT a good way to leverage a dual-core processor. Doing hacks
like this only makes sense in a CPU-intensive application (which curl
hardly is), and you want to split the work between two (or maybe more)
*threads* more or less equally. You also want these threads being
managed in a thread pool to avoid OS thread initialisation time. For
added hilarity, you need native threads for this, not green threads -
the OS can't schedule those on different cores.

Technically, you could do this using processes instead of threads.
Except once again, you want to outweigh the process initialisation time,
and the time it takes to transfer data between the processes, with the
added performance eliminating context switches brings. Which just might
not be all that easy.

David Vallner