Asp Forum - best-performing Rss parser

Ray Chen

8/19/2006 7:17:00 AM

Hi all,

I am working on a project that requires rss parsing for pre-fetched web
pages. I don't need caching or anything fancy, just the rss parsing
itself. I am currently using sporkmonger's feedtools, but I am
wondering if anything out there is better in terms of performance.
Hopefully someone has tested a few parsers. If not, I'll run some
simple tests and post back to the list. Some parsers under
consideration:

http://raa.ruby-lang.org/pr...
http://simple-rss.ruby...
http://sporkmonger.com/projects/...

Thanks
Ray

--
Posted via http://www.ruby-....

11 Answers

Ray Chen

8/19/2006 7:33:00 AM

One more.
http://syndication.rubyforg...

--
Posted via http://www.ruby-....

Bob Aman

8/20/2006 2:14:00 AM

> One more.
> http://syndication.rubyforg...

Right now, I'm recommending that people who care about performance use
the UFP instead.

Cheers,
Bob Aman
--
AIM: sporkmonger
Jabber: sporkmonger@jabber.org

Ray Chen

8/28/2006 6:04:00 PM

Just wanted to give everyone who's curious an update on this situation.

I ended up comparing just Feedtools and Syndication, since the others
were not as feature-complete as I would have liked.

In our system tests, Syndication performs a lot better than Feedtools,
but I won't publicize the results here since those are specific to our
system.

In just the parsing portion of parsing
http://www.digg.com/rss/chandrasonic/... (grabbing 40 entries, 40
titles, 40 dates, etc etc) Syndication took 0.062467 seconds of total
time and Feedtools took 4.227067 seconds of total time. This level of
performance difference is highly reproduceable and I am using this
specific feed just to show some numbers.

Syndication does have some downsides however. The programmer must
specifically whether a feed is RSS or Atom since Syndication has no
built-in distinguishing mechanism. Syndication also seems to have
incomplete time-stamp support. Slashdot time stamps, for example,
doesn't work for me.

Disclaimer: I'm not an expert on Ruby, RSS feeds or benchmarking, but
just wanted to share my results. Thanks Bob for all your help along the
way.

--
Posted via http://www.ruby-....

Bob Aman

9/4/2006 10:32:00 PM

> Just wanted to give everyone who's curious an update on this situation.
>
> I ended up comparing just Feedtools and Syndication, since the others
> were not as feature-complete as I would have liked.
>
> In our system tests, Syndication performs a lot better than Feedtools,
> but I won't publicize the results here since those are specific to our
> system.
>
> In just the parsing portion of parsing
> http://www.digg.com/rss/chandrasonic/... (grabbing 40 entries, 40
> titles, 40 dates, etc etc) Syndication took 0.062467 seconds of total
> time and Feedtools took 4.227067 seconds of total time. This level of
> performance difference is highly reproduceable and I am using this
> specific feed just to show some numbers.
>
> Syndication does have some downsides however. The programmer must
> specifically whether a feed is RSS or Atom since Syndication has no
> built-in distinguishing mechanism. Syndication also seems to have
> incomplete time-stamp support. Slashdot time stamps, for example,
> doesn't work for me.
>
> Disclaimer: I'm not an expert on Ruby, RSS feeds or benchmarking, but
> just wanted to share my results. Thanks Bob for all your help along the
> way.

The results aren't even remotely suprising to me, but yeah, they
really underscore several points:

1) REXML is a performance stinker (and by extension, FeedTools 10x more so)
2) If performance is an issue, a parse-at-all-costs parser is going to
be a problem unless it's written in C or something similarly fast
3) As always, use the right tool for the job -- FeedTools isn't going
to scale, so if you need scalability, don't use it
4) If you're dealing with ~500 feeds or less, you want to always get
the right answers back, you don't mind the size of the library, and
you want a Ruby-only solution, FeedTools is what you want

In case it's not obvious, point #4 has a lot of conditions.

At some point in the distant future, I intend to write a pure C
library that -will- scale, but that's a long way off, because for the
forseeable future I'm going to be working on GentleCMS.

Cheers,
Bob Aman
--
AIM: sporkmonger
Jabber: sporkmonger@jabber.org

Sunil Khedar

1/31/2008 8:20:00 AM

Hi Bob,

I am working on a RSS parser script. Here I have to parser thousands and
thousands of RSS feeds every hour.

I am looking for a optimized parser which can take parse all these
feeds. Please suggest the RSS parser you have come across.

Thanks in advance.

--
Posted via http://www.ruby-....

Marco Colli

2/14/2009 1:24:00 PM

Sunil Khedar wrote:
> Hi Bob,
>
> I am working on a RSS parser script. Here I have to parser thousands and
> thousands of RSS feeds every hour.
>
> I am looking for a optimized parser which can take parse all these
> feeds. Please suggest the RSS parser you have come across.
>
> Thanks in advance.

Hi, I am looking for an high-performance parser too.
Have you come across any solution?

Thanks!
--
Posted via http://www.ruby-....

Trans

2/14/2009 3:04:00 PM

On Feb 14, 8:24=A0am, Marco Colli <collimarc...@gmail.com> wrote:
> Sunil Khedar wrote:
> > Hi Bob,
>
> > I am working on a RSS parser script. Here I have to parser thousands an=
d
> > thousands of RSS feeds every hour.
>
> > I am looking for a optimized parser which can take parse all these
> > feeds. Please suggest the RSS parser you have come across.
>
> > Thanks in advance.
>
> Hi, I am looking for an high-performance parser too.
> Have you come across any solution?

If speed is the #1 issue, I would try libxml-ruby. It's lower lever
then you might want, but that gives you a choice on how to parse, and
it's not be hard to build an rss specific layer on top of that if you
really want it.

T.

Michael Fellinger

2/15/2009 11:33:00 AM

On Sat, Feb 14, 2009 at 10:24 PM, Marco Colli <collimarco91@gmail.com> wrote:
> Sunil Khedar wrote:
>> Hi Bob,
>>
>> I am working on a RSS parser script. Here I have to parser thousands and
>> thousands of RSS feeds every hour.
>>
>> I am looking for a optimized parser which can take parse all these
>> feeds. Please suggest the RSS parser you have come across.
>>
>> Thanks in advance.
>
> Hi, I am looking for an high-performance parser too.
> Have you come across any solution?

http://www.rubyinside.com/feedzirra-a-new-ruby-feed-library-built-for-speed...

^ manveru

Tony Arcieri

2/17/2009 2:59:00 AM

[Note: parts of this message were removed to make it a legal post.]

On Thu, Jan 31, 2008 at 1:20 AM, Sunil Khedar <sunil@truesparrow.com> wrote:

> I am working on a RSS parser script. Here I have to parser thousands and
> thousands of RSS feeds every hour.
>
> I am looking for a optimized parser which can take parse all these
> feeds. Please suggest the RSS parser you have come across.
>

Sounds like a case of premature optimization to me. If you intend to do
anything like stick the data parsed from the feeds into a database or search
index, I think you'll quickly find that will become the bottleneck, rather
than the feed processing itself.

My company went through something similar, with a performance obsessed
former C++ programmer looking for the fastest feed parsing solution
available. He settled on building his own, highly procedural feed processor
around libxml-ruby after benchmarking several of the solutions available.
However, soon after he discovered that updating the database and search
index was a far bigger bottleneck, one he spent the next several months
addressing. Feed parsing speed went completely by the wayside.

If you intend to do any sort of indexing of the feeds at all, you should
really focus on building a maintainable feed reader, as opposed to a fast
one. The database and/or search index are going to be your bottleneck
anyway, so don't let the desire for speed trump things like correctness and
code clarity. Feed processing is something that scales horizontally using a
queue and multiple feed reader processes, as opposed to databases and search
indexes which generally don't scale quite as well.

Given that, I would suggest looking at existing solutions like feedtools and
feedzirra before trying to write your own, and if you do, go with Nokogiri.
It has a nice, clear, easy-to-use API and is relatively fast.

--
Tony Arcieri

raulparraco

2/17/2009 3:05:00 AM

unsubscribe'

--------------------------------------------------
From: "Tony Arcieri" <tony@medioh.com>
Sent: Monday, February 16, 2009 9:58 PM
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Subject: Re: best-performing Rss parser

> On Thu, Jan 31, 2008 at 1:20 AM, Sunil Khedar <sunil@truesparrow.com>
> wrote:
>
>> I am working on a RSS parser script. Here I have to parser thousands and
>> thousands of RSS feeds every hour.
>>
>> I am looking for a optimized parser which can take parse all these
>> feeds. Please suggest the RSS parser you have come across.
>>
>
> Sounds like a case of premature optimization to me. If you intend to do
> anything like stick the data parsed from the feeds into a database or
> search
> index, I think you'll quickly find that will become the bottleneck, rather
> than the feed processing itself.
>
> My company went through something similar, with a performance obsessed
> former C++ programmer looking for the fastest feed parsing solution
> available. He settled on building his own, highly procedural feed
> processor
> around libxml-ruby after benchmarking several of the solutions available.
> However, soon after he discovered that updating the database and search
> index was a far bigger bottleneck, one he spent the next several months
> addressing. Feed parsing speed went completely by the wayside.
>
> If you intend to do any sort of indexing of the feeds at all, you should
> really focus on building a maintainable feed reader, as opposed to a fast
> one. The database and/or search index are going to be your bottleneck
> anyway, so don't let the desire for speed trump things like correctness
> and
> code clarity. Feed processing is something that scales horizontally using
> a
> queue and multiple feed reader processes, as opposed to databases and
> search
> indexes which generally don't scale quite as well.
>
> Given that, I would suggest looking at existing solutions like feedtools
> and
> feedzirra before trying to write your own, and if you do, go with
> Nokogiri.
> It has a nice, clear, easy-to-use API and is relatively fast.
>
> --
> Tony Arcieri
>

comp.lang.ruby

best-performing Rss parser

Ray Chen

Ray Chen

Bob Aman

Ray Chen

Bob Aman

Sunil Khedar

Marco Colli

Trans

Michael Fellinger

Tony Arcieri

raulparraco

x Login to ForumsZone