raulparraco
2/17/2009 3:05:00 AM
unsubscribe'
--------------------------------------------------
From: "Tony Arcieri" <tony@medioh.com>
Sent: Monday, February 16, 2009 9:58 PM
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Subject: Re: best-performing Rss parser
> On Thu, Jan 31, 2008 at 1:20 AM, Sunil Khedar <sunil@truesparrow.com>
> wrote:
>
>> I am working on a RSS parser script. Here I have to parser thousands and
>> thousands of RSS feeds every hour.
>>
>> I am looking for a optimized parser which can take parse all these
>> feeds. Please suggest the RSS parser you have come across.
>>
>
> Sounds like a case of premature optimization to me. If you intend to do
> anything like stick the data parsed from the feeds into a database or
> search
> index, I think you'll quickly find that will become the bottleneck, rather
> than the feed processing itself.
>
> My company went through something similar, with a performance obsessed
> former C++ programmer looking for the fastest feed parsing solution
> available. He settled on building his own, highly procedural feed
> processor
> around libxml-ruby after benchmarking several of the solutions available.
> However, soon after he discovered that updating the database and search
> index was a far bigger bottleneck, one he spent the next several months
> addressing. Feed parsing speed went completely by the wayside.
>
> If you intend to do any sort of indexing of the feeds at all, you should
> really focus on building a maintainable feed reader, as opposed to a fast
> one. The database and/or search index are going to be your bottleneck
> anyway, so don't let the desire for speed trump things like correctness
> and
> code clarity. Feed processing is something that scales horizontally using
> a
> queue and multiple feed reader processes, as opposed to databases and
> search
> indexes which generally don't scale quite as well.
>
> Given that, I would suggest looking at existing solutions like feedtools
> and
> feedzirra before trying to write your own, and if you do, go with
> Nokogiri.
> It has a nice, clear, easy-to-use API and is relatively fast.
>
> --
> Tony Arcieri
>