hemant
7/23/2007 1:28:00 PM
On 7/23/07, Alex Young <alex@blackkettle.org> wrote:
> hemant wrote:
> > On 7/23/07, Robert Dober <robert.dober@gmail.com> wrote:
> >> On 7/23/07, geetha <sangeetha.geethu05@gmail.com> wrote:
> >> > Hi,
> >> > I am doing string search is one html file usign ruby.
> >> > If the seach sting is htmlentities means I have not match that word.
> >> > How can i do it. Please any one help me.....
> >> >
> >> > regards,
> >> > S.Sangeetha.
> >> >
> >> We might be able to help you better if you post the data and what you
> >> expect to get out from it exactly.
> >>
> >> Robert
> >>
> >> --
> >
> > Robert:
> > If search string has html entities, then do not proceed with search.
> >
> > Well, its very hard to define if query string has HTML entities or
> > not?
> No it's not...
Honestly, its up to user. Unless we are talking about valid XHTML,
which is definitely defined.
>
> > For example, do you consider following string has HTML entities?
> >
> > b = "hello world and so what; and < and there we go >"
> >
> > dunno yes and no, but if your answer is yes,
> Then you'd be wrong.
>
> irb(main):001:0> require 'rexml/text'
> => true
> irb(main):002:0> re = REXML::Text::REFERENCE
> => /(?:&([\w:][\-\w\d\.:]*);|&#\d+;|&#x[0-9a-fA-F]+;)/
> irb(main):003:0> "this & that" =~ re
> => 5
> irb(main):004:0> "hello world and so what; and < and there we go >" =~ re
> => nil
>
> Admittedly I'm not scanning for all defined HTML entities, just for
> valid XML entities, but given that one's a superset of the other, and
> undefined entity references probably shouldn't occur within a valid HTML
> document anyway, it's good enough for most purposes...
>
I thought about this, when I was posting that response, but somehow i
felt user is not looking for valid HTML, but just if it contains HTML
entities or not?