Sam Kong
5/10/2005 3:59:00 PM
Tom Reilly wrote:
> Several years ago, one of the members of the group offered me this
> routine which does a pretty good job of
> extracting the text from a html page.
>
> #--------------------------------------------------------------------
> # Strip HTML Tags from Line
> #--------------------------------------------------------------------
>
> def striphtml(line)
> line.gsub(/\n/, ' ').gsub(/<.*?>/, '')
> end
Thank you for sharing the code.
However, this code works only for a simple line, right?
When I tested it with a page of html by looping line by line, the
result was not what I expected.
Probably, I need to get a DOM parser...:-(
Sam