[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

fast text parsing

snacktime

10/31/2006 5:02:00 AM

Say I'm parsing stuff like http headers, what is going to give better
performance? Strings with regular expressions? StringIO with
readline? splitting strings into arrays on a delimiter? Or is it
going to be so close it's not really an issue?

Chris

3 Answers

Ara.T.Howard

10/31/2006 5:15:00 AM

0

Jeremy Hinegardner

10/31/2006 5:42:00 AM

0

On Tue, Oct 31, 2006 at 02:14:42PM +0900, ara.t.howard@noaa.gov wrote:
> On Tue, 31 Oct 2006, snacktime wrote:
>
> >Say I'm parsing stuff like http headers, what is going to give better
> >performance? Strings with regular expressions? StringIO with readline?
> >splitting strings into arrays on a delimiter? Or is it going to be so
> >close
> >it's not really an issue?
> >
> >Chris
>
> if you try to write your regular expressions badly enough they can surely
> use
> the most cpu ;-)

It all 'depends' :-) If you're doing http header parsing, why not just
use the header parsing in mongrel. It's already available as a C
extension, probably not going to get much faster than that.

But if you want to stick with the strict ruby parsing, experiment and see
what works. I was parsing all the netflix[1] data with ruby for fun and
I found out some interesting things about text parsing, at least on my
laptop:

- if you only need the data between two delimiter, it was
faster to do String#index 2x's and slice the data out of the
middle vs, split and index into the array

- but, if you had 3 items you wanted out, it was faster to do the
split.

- for simple parsing, regex's were overkill, but if you want to use
them make sure to compile them once, use them MANY times

enjoy,

-jeremy

[1] - http://www.netflixprize...

--
========================================================================
Jeremy Hinegardner jeremy@hinegardner.org


snacktime

10/31/2006 5:54:00 AM

0

On 10/30/06, Jeremy Hinegardner <jeremy@hinegardner.org> wrote:
> On Tue, Oct 31, 2006 at 02:14:42PM +0900, ara.t.howard@noaa.gov wrote:
> > On Tue, 31 Oct 2006, snacktime wrote:
> >
> > >Say I'm parsing stuff like http headers, what is going to give better
> > >performance? Strings with regular expressions? StringIO with readline?
> > >splitting strings into arrays on a delimiter? Or is it going to be so
> > >close
> > >it's not really an issue?
> > >
> > >Chris
> >
> > if you try to write your regular expressions badly enough they can surely
> > use
> > the most cpu ;-)
>
> It all 'depends' :-) If you're doing http header parsing, why not just
> use the header parsing in mongrel. It's already available as a C
> extension, probably not going to get much faster than that.

I am using it actually, but I"m writing a proxy and I need to parse
the headers the server returns also. I was thinking about just adding
a parser class to the mongrel parser to do this based on the existing
one, still not decided though.