[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

need script: convert html-text to text

keal

1/4/2006 10:30:00 AM

i have html-text. i have to convert this text to simple text without
html-tags.

--
Posted via http://www.ruby-....


3 Answers

Gene Tani

1/4/2006 10:40:00 AM

0


keal wrote:
> i have html-text. i have to convert this text to simple text without
> html-tags.
>
> --
> Posted via http://www.ruby-....

path o'least resistance

lynx -dump www.myurl
or use links2 ## or w3m -dump www.myurl

or high-falutin solution
http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/e0fb1207f1814c77/37cd5e35a1ffb8d7?q=strip+HTML+tags&rnum=7#37cd5e...

Ross Bamford

1/4/2006 10:49:00 AM

0

On Wed, 04 Jan 2006 10:30:03 -0000, keal <keal21@mail.ru> wrote:

> i have html-text. i have to convert this text to simple text without
> html-tags.
>

It's tricky, there's more to it than you'd think. The best way is probably
to use Lynx, or another browser, to do it for you, e.g.:

def plain(url)
`lynx -dump "#{url}"`
end

p = plain('http://www.google...)
puts p

Outputs:

[1]Personalised Home | [2]Sign in

[3]A picture of the Braille letters spelling out "Google." Happy Birthday
Louis Braille!

Web [4]Images [5]Groups [6]News [7]Froogle [8]more »

> ... [snip] ...

Of course you'll need lynx for that to work, but you can use others too.
Try a Google search.

Cheers,

--
Ross Bamford - rosco@roscopeco.remove.co.uk

Robert Klemme

1/4/2006 10:51:00 AM

0

keal wrote:
> i have html-text. i have to convert this text to simple text without
> html-tags.

This is a very low cost variant - I guess the lynx approach is much more
effective and complete:

ruby -pe 'gsub! %r{</?.*?>}, ""' index.html

Kind regards

robert