[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: help me....

Robert Dober

7/23/2007 11:54:00 AM

On 7/23/07, geetha <sangeetha.geethu05@gmail.com> wrote:
> Hi,
> I am doing string search is one html file usign ruby.
> If the seach sting is htmlentities means I have not match that word.
> How can i do it. Please any one help me.....
>
> regards,
> S.Sangeetha.
>
We might be able to help you better if you post the data and what you
expect to get out from it exactly.

Robert

--
I always knew that one day Smalltalk would replace Java.
I just didn't know it would be called Ruby
-- Kent Beck

5 Answers

hemant

7/23/2007 12:32:00 PM

0

On 7/23/07, Robert Dober <robert.dober@gmail.com> wrote:
> On 7/23/07, geetha <sangeetha.geethu05@gmail.com> wrote:
> > Hi,
> > I am doing string search is one html file usign ruby.
> > If the seach sting is htmlentities means I have not match that word.
> > How can i do it. Please any one help me.....
> >
> > regards,
> > S.Sangeetha.
> >
> We might be able to help you better if you post the data and what you
> expect to get out from it exactly.
>
> Robert
>
> --

Robert:
If search string has html entities, then do not proceed with search.

Well, its very hard to define if query string has HTML entities or
not? For example, do you consider following string has HTML entities?

b = "hello world and so what; and < and there we go >"

dunno yes and no, but if your answer is yes, and string `b` HAS HTML
entities then:

require 'cgi'

escaped_html = CGI::escapeHTML(b)
if escaped_html != b
# string contains html entities
end

if you want a strict validation of HTML tags, and whether query is a
valid HTML, then hpricot may help.





--
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://blog.g...

hemant

7/23/2007 12:51:00 PM

0

On 7/23/07, hemant <gethemant@gmail.com> wrote:
> On 7/23/07, Robert Dober <robert.dober@gmail.com> wrote:
> > On 7/23/07, geetha <sangeetha.geethu05@gmail.com> wrote:
> > > Hi,
> > > I am doing string search is one html file usign ruby.
> > > If the seach sting is htmlentities means I have not match that word.
> > > How can i do it. Please any one help me.....
> > >
> > > regards,
> > > S.Sangeetha.
> > >
> > We might be able to help you better if you post the data and what you
> > expect to get out from it exactly.
> >
> > Robert
> >
> > --
>
> Robert:
> If search string has html entities, then do not proceed with search.
>
> Well, its very hard to define if query string has HTML entities or
> not? For example, do you consider following string has HTML entities?
>
> b = "hello world and so what; and < and there we go >"
>
> dunno yes and no, but if your answer is yes, and string `b` HAS HTML
> entities then:
>
> require 'cgi'
>
> escaped_html = CGI::escapeHTML(b)
> if escaped_html != b
> # string contains html entities
> end
>
> if you want a strict validation of HTML tags, and whether query is a
> valid HTML, then hpricot may help.
>
>

Some tips on asking questions:

1. Have a meaningful subject or else your question will look like spam.
2. Also please respond to answers that people are posting in response
to your question.
3. Robert suggestions.

--
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://blog.g...

Alex Young

7/23/2007 1:12:00 PM

0

hemant wrote:
> On 7/23/07, Robert Dober <robert.dober@gmail.com> wrote:
>> On 7/23/07, geetha <sangeetha.geethu05@gmail.com> wrote:
>> > Hi,
>> > I am doing string search is one html file usign ruby.
>> > If the seach sting is htmlentities means I have not match that word.
>> > How can i do it. Please any one help me.....
>> >
>> > regards,
>> > S.Sangeetha.
>> >
>> We might be able to help you better if you post the data and what you
>> expect to get out from it exactly.
>>
>> Robert
>>
>> --
>
> Robert:
> If search string has html entities, then do not proceed with search.
>
> Well, its very hard to define if query string has HTML entities or
> not?
No it's not...

> For example, do you consider following string has HTML entities?
>
> b = "hello world and so what; and < and there we go >"
>
> dunno yes and no, but if your answer is yes,
Then you'd be wrong.

irb(main):001:0> require 'rexml/text'
=> true
irb(main):002:0> re = REXML::Text::REFERENCE
=> /(?:&([\w:][\-\w\d\.:]*);|&#\d+;|&#x[0-9a-fA-F]+;)/
irb(main):003:0> "this &amp; that" =~ re
=> 5
irb(main):004:0> "hello world and so what; and < and there we go >" =~ re
=> nil

Admittedly I'm not scanning for all defined HTML entities, just for
valid XML entities, but given that one's a superset of the other, and
undefined entity references probably shouldn't occur within a valid HTML
document anyway, it's good enough for most purposes...

--
Alex

hemant

7/23/2007 1:28:00 PM

0

On 7/23/07, Alex Young <alex@blackkettle.org> wrote:
> hemant wrote:
> > On 7/23/07, Robert Dober <robert.dober@gmail.com> wrote:
> >> On 7/23/07, geetha <sangeetha.geethu05@gmail.com> wrote:
> >> > Hi,
> >> > I am doing string search is one html file usign ruby.
> >> > If the seach sting is htmlentities means I have not match that word.
> >> > How can i do it. Please any one help me.....
> >> >
> >> > regards,
> >> > S.Sangeetha.
> >> >
> >> We might be able to help you better if you post the data and what you
> >> expect to get out from it exactly.
> >>
> >> Robert
> >>
> >> --
> >
> > Robert:
> > If search string has html entities, then do not proceed with search.
> >
> > Well, its very hard to define if query string has HTML entities or
> > not?
> No it's not...

Honestly, its up to user. Unless we are talking about valid XHTML,
which is definitely defined.

>
> > For example, do you consider following string has HTML entities?
> >
> > b = "hello world and so what; and < and there we go >"
> >
> > dunno yes and no, but if your answer is yes,
> Then you'd be wrong.
>
> irb(main):001:0> require 'rexml/text'
> => true
> irb(main):002:0> re = REXML::Text::REFERENCE
> => /(?:&([\w:][\-\w\d\.:]*);|&#\d+;|&#x[0-9a-fA-F]+;)/
> irb(main):003:0> "this &amp; that" =~ re
> => 5
> irb(main):004:0> "hello world and so what; and < and there we go >" =~ re
> => nil
>
> Admittedly I'm not scanning for all defined HTML entities, just for
> valid XML entities, but given that one's a superset of the other, and
> undefined entity references probably shouldn't occur within a valid HTML
> document anyway, it's good enough for most purposes...
>

I thought about this, when I was posting that response, but somehow i
felt user is not looking for valid HTML, but just if it contains HTML
entities or not?

John Joyce

7/25/2007 3:01:00 PM

0

Hpricot is certainly one tool you should consider.
also Rexml and Scrubyt.
Scrubyt is more for web-scraping but if you can scrape it, you can
remove it too.