Asp Forum - Easy way for a Nuub to get link-element from a html-source

Marcus Strube

11/26/2007 10:24:00 AM

hi all.

im very new to ruby and im not sure how to do this the easiest way in
ruby. i want to read the content from e.g. "www.spiegel.de" and just
this line

<link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
als RSS-Feed" href="http://www.spiegel.de/schlagzeilen/rss/index... />

and from this line the "title" and the "href"

since the order in "link" is not sure, it doesnt look like regexp is the
first choice. and i couldn't find a HTML::Parse.
--
Posted via http://www.ruby-....

6 Answers

Lee Jarvis

11/26/2007 10:57:00 AM

Marcus Strube wrote:
> since the order in "link" is not sure, it doesnt look like regexp is the
> first choice. and i couldn't find a HTML::Parse.

Check out hpricot.

http://code.whytheluckystiff.ne...

Regards,
Lee
--
Posted via http://www.ruby-....

Kai Brust

11/26/2007 11:18:00 AM

On 26.11.2007, at 11:23, Marcus Strube wrote:

> hi all.
>
> im very new to ruby and im not sure how to do this the easiest way in
> ruby. i want to read the content from e.g. "www.spiegel.de" and just
> this line
>
> <link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
> als RSS-Feed" href="http://www.spiegel.de/schlagz...
> index.xml" />
>
> and from this line the "title" and the "href"
>
> since the order in "link" is not sure, it doesnt look like regexp is
> the
> first choice. and i couldn't find a HTML::Parse.

How about hpricot?

http://code.whytheluckystiff.ne...

- Kai Brust

Marcus Strube

11/26/2007 11:38:00 AM

> How about hpricot?
>
> http://code.whytheluckystiff.ne...

ok, hpricot then.

is it just

gem install hpricot ??

or do i need to install this "ragel"-thing too?? (and if so which which
is the best way to do so??)

--
Posted via http://www.ruby-....

Peter Szinek

11/26/2007 11:48:00 AM

Marcus Strube wrote:
> hi all.
>
> im very new to ruby and im not sure how to do this the easiest way in
> ruby. i want to read the content from e.g. "www.spiegel.de" and just
> this line
>
> <link rel="alternate" type="application/rss+xml" title="SPIEGEL ONLINE
> als RSS-Feed" href="http://www.spiegel.de/schlagzeilen/rss/index... />
>
> and from this line the "title" and the "href"
>
> since the order in "link" is not sure, it doesnt look like regexp is the
> first choice. and i couldn't find a HTML::Parse.

Another possibility is scRUBYt!:

==========================================
require 'rubygems'
require 'scrubyt'

feed_data = Scrubyt::Extractor.define do
fetch 'http://www.spiege...

link "//link[@rel='alternate']" do
title "title", :type => :attribute
href "href", :type => :attribute
end
end

puts feed_data.to_xml
==========================================

output:

==========================================
<root>
<link>
<title>SPIEGEL ONLINE als RSS-Feed</title>
<href>http://www.spiegel.de/schlagzeilen/rss/index.xml<...
</link>
</root>
==========================================

or, to_hash:

==========================================
[{:title=>"SPIEGEL ONLINE als RSS-Feed",
:href=>"http://www.spiegel.de/schlagzeilen/rss/index...}]
==========================================

Cheers,
Peter
___
http://www.rubyra...
http://s...

Marcus Strube

11/26/2007 1:24:00 PM

> Another possibility is scRUBYt!:

That looks good. That looks good. Thank you!

--
Posted via http://www.ruby-....

Peter Szinek

11/26/2007 1:58:00 PM

Marcus Strube wrote:
>> Another possibility is scRUBYt!:
>
> That looks good. That looks good. Thank you!

Hm yeah, but the downside (as of the recent version - it'll be fixed in
the next one) is that the installation process is somewhat... hmm... not
that easy (mainly if you are on win32). If you still decide to go for
scRUBYt!, we can talk on #scrubyt @ irc.freenode.net or you can ask your
questions in the forum (http://agora.s...).

Cheers,
Peter
___
http://www.rubyra...
http://s...

comp.lang.ruby

Easy way for a Nuub to get link-element from a html-source

Marcus Strube

Lee Jarvis

Kai Brust

Marcus Strube

Peter Szinek

Marcus Strube

Peter Szinek

x Login to ForumsZone