[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

hpricot - parse html

K. R.

1/2/2008 9:13:00 PM

hi @all

I would like to parse html code and remove all tags that starts with
<!-- and end with -->

How can I remove this tags with regex? I used the gsub! function to
manipulate the string.

Thanks for helping...
--
Posted via http://www.ruby-....

3 Answers

Jim Clark

1/3/2008 3:38:00 AM

0

Try this...

C:\temp>irb
irb(main):001:0> mystring = "xxx<!-- and end with --> yy <!-- another
comment --> zz"
=> "xxx<!-- and end with --> yy <!-- another comment --> zz"
irb(main):002:0> mystring.gsub(/<!--.*?-->/,'')
=> "xxx yy zz"

Regards,
Jim

K. R. wrote:
> hi @all
>
> I would like to parse html code and remove all tags that starts with
> <!-- and end with -->
>
> How can I remove this tags with regex? I used the gsub! function to
> manipulate the string.
>
> Thanks for helping...
>


Dingding Ye

1/3/2008 10:37:00 AM

0

[Note: parts of this message were removed to make it a legal post.]

You should also process the \n, \r char.

So I think the regex should be "<!--(.|\n|\r)*?-->".

On Jan 3, 2008 11:37 AM, Jim Clark <diegoslice@gmail.com> wrote:

> Try this...
>
> C:\temp>irb
> irb(main):001:0> mystring = "xxx<!-- and end with --> yy <!-- another
> comment --> zz"
> => "xxx<!-- and end with --> yy <!-- another comment --> zz"
> irb(main):002:0> mystring.gsub(/<!--.*?-->/,'')
> => "xxx yy zz"
>
> Regards,
> Jim
>
> K. R. wrote:
> > hi @all
> >
> > I would like to parse html code and remove all tags that starts with
> > <!-- and end with -->
> >
> > How can I remove this tags with regex? I used the gsub! function to
> > manipulate the string.
> >
> > Thanks for helping...
> >
>
>
>

Daniel Brumbaugh Keeney

1/3/2008 5:52:00 PM

0

On Jan 3, 2008 4:37 AM, sishen <yedingding@gmail.com> wrote:
> You should also process the \n, \r char.
>
> So I think the regex should be "<!--(.|\n|\r)*?-->".

Don't forget about the multiline option, it's easy, just stick an 'm'
after the regexp.

Daniel Brumbaugh Keeney