[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Webpage to RSS

Bart Braem

10/9/2006 7:12:00 PM

Does anyone have a ruby based script lying around that would transform
updates to a webpage to RSS or some other feed format?
We don't use a CMS for our website but there are items that are updated
often and RSS feeds might be appreciated. Someone must have done this
before I guess.
So is there a script that might do that? The categories are separated by h2
tags and the items are in li tags.

Bart
5 Answers

hemant

10/10/2006 10:40:00 AM

0

On 10/10/06, Chris Carter <cdcarter@gmail.com> wrote:
> Hi,
> Since this is such a specailized task, it really depends on the website you
> are transforming. I would suggest you take a look at Hpricot(
> http://code.whytheluckystiff.n...) and at the RSS class in the
> standard library. It shouldn't be to hard to roll one up, and we can always
> help. I am usually in #ruby-lang on freenode after 7 every night.
> --
> Chris
>
>
> On 10/9/06, Bart Braem <bart.braem@gmail.com> wrote:
> >
> > Does anyone have a ruby based script lying around that would transform
> > updates to a webpage to RSS or some other feed format?
> > We don't use a CMS for our website but there are items that are updated
> > often and RSS feeds might be appreciated. Someone must have done this
> > before I guess.
> > So is there a script that might do that? The categories are separated by
> > h2
> > tags and the items are in li tags.
> >
> > Bart
> >
> >
>
>

Rails Recipes Book by Chad Fowler has similar stuff. Almost ready for
use. If you don't have the book, still you can download the code
sample i guess.

--
There was only one Road; that it was like a great river: its springs
were at every doorstep, and every path was its tributary.

Lutz Horn

10/10/2006 10:57:00 AM

0

Hi,

On Oct 9, 9:12 pm, Bart Braem <bart.br...@gmail.com> wrote:
> Does anyone have a ruby based script lying around that would transform
> updates to a webpage to RSS or some other feed format?

You could use hpricot (http://code.whytheluckystiff.ne...) to
parse the HTML and then use feedtools
(http://sporkmonger.com/articles/2005/08/11...) to generate the
RSS.

Lutz

why the lucky stiff

10/10/2006 5:39:00 PM

0

On Wed, Oct 11, 2006 at 12:30:16AM +0900, Bart Braem wrote:
> One question though: do you see a way of parsing a structure like this with
> hpricot:
>
> <h3>Structure 1</h3>
> <h4>Substructre 1</h4>
>
> <p>Substructure info</p>
>
> <ul>
>
> <li><a href="somefile">Somefiles description</a>. Addition date.</li>
>
> I can cope with setting a date in the RSS, the problem is parsing this
> structure. There is no surrounding element for the ul and I need both the
> structure and the substructure information because the combination of those
> too defines the effective identity of the ul and its items.
> There seems to be no method to "give everything between to specific tags and
> then go on to the next one"...

I'm not sure I understand exactly, but here's my impression of what you're
trying to do.

doc = Hpricot(html_string)
(doc/:h3).each do |ele|
rss_title = ele # okay, so you have the 3rd-level header
rss_contents = Hpricot::Elements[]

while ele = h3.next_sibling
rss_contents << ele
break if ele.respond_to?(:name) and ele.name == "ul"
end
end

So, basically, you can use `next_sibling` (or `previous_sibling`) to walk back
and forth between HTML brothers and sisters. I store it in an Hpricot::Elements
array, since you can then just call `rss_contents.to_html` or do other searches
on it.

This is available since changset [49], so you'll need to either install from SVN
or monkeypatch.

_why

[49] http://code.whytheluckystiff.net/hpricot/ch...

Bart Braem

10/10/2006 9:26:00 PM

0

Lutz Horn wrote:

> You could use hpricot (http://code.whytheluckystiff.ne...) to
> parse the HTML and then use feedtools
> (http://sporkmonger.com/articles/2005/08/11...) to generate the
> RSS.

Wow hpricot seems pretty nice, I noticed the hype but now I understand...
One question though: do you see a way of parsing a structure like this with
hpricot:

<h3>Structure 1</h3>
<h4>Substructre 1</h4>

<p>Substructure info</p>

<ul>

<li><a href="somefile">Somefiles description</a>. Addition date.</li>

I can cope with setting a date in the RSS, the problem is parsing this
structure. There is no surrounding element for the ul and I need both the
structure and the substructure information because the combination of those
too defines the effective identity of the ul and its items.
There seems to be no method to "give everything between to specific tags and
then go on to the next one"...

Thanks for the pointers
Bart

Bart Braem

10/11/2006 11:50:00 AM

0

why the lucky stiff wrote:

> I'm not sure I understand exactly, but here's my impression of what you're
> trying to do.
>
> doc = Hpricot(html_string)
> (doc/:h3).each do |ele|
> rss_title = ele  # okay, so you have the 3rd-level header
> rss_contents = Hpricot::Elements[]
>
> while ele = h3.next_sibling
> rss_contents << ele
> break if ele.respond_to?(:name) and ele.name == "ul"
> end
> end
>
> So, basically, you can use `next_sibling` (or `previous_sibling`) to walk
> back and forth between HTML brothers and sisters.  I store it in an
> Hpricot::Elements array, since you can then just call
> `rss_contents.to_html` or do other searches on it.
>
> This is available since changset [49], so you'll need to either install
> from SVN or monkeypatch.

The next_sibling and previous_sibling methods are just what I needed.
Now for an svn checkout...

Thanks a lot!
Bart