[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Extract/Parse String?

tuyet.ctn

7/6/2005 2:01:00 AM

How do I extract "treeframe1120266500902" from this String class
and stored it in a variable to be used later?

irb(main):205:0> puts c

<FRAMESET border=0 frameSpacing=0 rows=26,* frameBorder=0
onload=onLoad(); cols=

* onunload=onUnload()><FRAME border=0 name=sidebar_header marginWidth=0
marginHe

ight=0
src="/araneae/PortfolioAdmin/Sidebar/showSidebarFiltersB?&amp;filterId=0&

amp;showHelp=true&amp;common.sessionId=sGCq3td6d5iQGx94yZ9DxA99"
frameBorder=0 n

oResize scrolling=no><FRAME border=0 name=treeframe1120266500902
marginWidth=4 m

arginHeight=0 src="/include/frameReady.html" frameBorder=0
noResize></FRAMESET>



irb(main):206:0> puts c.class

String

=> nil

11 Answers

Assaph Mehr

7/6/2005 3:51:00 AM

0

Use regular expressions
(http://www.ruby-doc.org/docs/ProgrammingRuby/html/int...), then
#scan the string for something that matches. Eg. assuming the format is
always 'treeframe' followed by digits:

irb(main):038:0> c.scan /treeframe\d+/
=> ["treeframe1120266500902"]

You'll get an array with all the results. If you know you have only one
occurence you can use String#slice (or String#[]) to get the first
value:

irb(main):037:0> c[/treeframe\d+/]
=> "treeframe1120266500902"

HTH,
Assaph

Devin Mullins

7/6/2005 4:08:00 AM

0

tuyet.ctn@mscibarra.com wrote:

>How do I extract "treeframe1120266500902" from this String class
>and stored it in a variable to be used later?
>
>
(Almost) Everything in Ruby is an Object, so what you're asking for is
another String object. "treeframe112..." is just a human-readable
representation of that object, and a variable is just a pointer to that
object.

Like Assaph said, you can use regexes to get such a String. ri
String#match or String#scan or StringScanner, for instance.

If you plan on parsing a lot of HTML, there are some Ruby HTML parsers.
Michael Neumann's Mechanize has been recommended on this list before,
but that's as much as I know about it.

Devin



Robert Klemme

7/6/2005 6:38:00 AM

0

Assaph Mehr <assaph@gmail.com> wrote:
> Use regular expressions
> (http://www.ruby-doc.org/docs/ProgrammingRuby/html/int...),
> then #scan the string for something that matches. Eg. assuming the
> format is always 'treeframe' followed by digits:
>
> irb(main):038:0> c.scan /treeframe\d+/
> => ["treeframe1120266500902"]
>
> You'll get an array with all the results. If you know you have only
> one occurence you can use String#slice (or String#[]) to get the first
> value:
>
> irb(main):037:0> c[/treeframe\d+/]
> => "treeframe1120266500902"
>
> HTH,
> Assaph

Although that'll work for this particular string, I'd rather think this is a
case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.

Kind regards

robert

tuyet.ctn

7/6/2005 10:27:00 PM

0

Thank you Assaph!

c[/treeframe\d+/] works beautifully!

I also appreciate your link to the intro.html although I couldn't find
examples of regular expressions.

Thanks everyone else for your suggestions. I appreciate it.

mrt

7/7/2005 12:26:00 PM

0

> Although that'll work for this particular string, I'd rather think this is a
> case for a HTML parser. Apparently the name of a frame is wanted and a HTML
> parser is the safest way to get that info.

Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

- Mark.

Robert Klemme

7/7/2005 1:23:00 PM

0

Mark Thomas wrote:
>> Although that'll work for this particular string, I'd rather think
>> this is a case for a HTML parser. Apparently the name of a frame is
>> wanted and a HTML parser is the safest way to get that info.
>
> Agree completely. Regular expressions should not be used to parse HTML
> or XML. However, XPath is an excellent alternative to regular
> expressions in these cases. In XPath, the expression to get the name
> of the frame would be '//frame/@name'.
>
> Since I'm new to Ruby, I have to ask: is there an HTML parser that
> supports XPath? I know that LibXML does a great job parsing HTML and I
> find XPath to be a terrific way to do it--just about anything you want
> to extract becomes a one-liner. Do the Ruby bindings expose this
> functionality? If not, is there another library that can do this?

Rexml can - but then again, it's "just" an XML parser.

Kind regards

robert

james_b

7/7/2005 1:43:00 PM

0

Mark Thomas wrote:
>>Although that'll work for this particular string, I'd rather think this is a
>>case for a HTML parser. Apparently the name of a frame is wanted and a HTML
>>parser is the safest way to get that info.
>
>
> Agree completely. Regular expressions should not be used to parse HTML
> or XML. However, XPath is an excellent alternative to regular
> expressions in these cases. In XPath, the expression to get the name of
> the frame would be '//frame/@name'.
>
> Since I'm new to Ruby, I have to ask: is there an HTML parser that
> supports XPath? I know that LibXML does a great job parsing HTML and I
> find XPath to be a terrific way to do it--just about anything you want
> to extract becomes a one-liner. Do the Ruby bindings expose this
> functionality? If not, is there another library that can do this?

REXML, part of the standard library, does XPath. If the source HTML is
not also XML, then you'll need to coerce it so REXML can load it.

Michael Neumann's Mechanize lib bundles up this behavior so that you can
grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.

James


>
> - Mark.
>
>
> .
>


--

http://www.ru... - The Ruby Documentation Site
http://www.r... - News, Articles, and Listings for Ruby & XML
http://www.rub... - The Ruby Store for Ruby Stuff
http://www.jame... - Playing with Better Toys


Brad Wilson

7/7/2005 1:44:00 PM

0

On 7/7/05, Mark Thomas <mrt@thomaszone.com> wrote:
> Since I'm new to Ruby, I have to ask: is there an HTML parser that
> supports XPath?

I used tidy to turn HTML into XHTML, and then REXML to navigate and
modify it. I could've turned it back into HTML with tidy again, but
leaving it as XHTML was acceptable for me (parsing HTML elements from
RSS and modifying them for import into a new blog engine).


mrt

7/7/2005 4:25:00 PM

0

> Michael Neumann's Mechanize lib bundles up this behavior so that you can
> grab an HTML doc and operate on select sections; you can also grab the
> resulting REXML document and run arbitrary XPath calls on it too.
> Search the ruby-talk archives as this was discussed not too long ago.

I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?
- Google searches bring up nothing
- RAA search doesn't find Mechanize
- Rubyforge search brings up project Wee, docs tab is empty, wiki is
blank, homepage has Wee docs but no Mechanize docs.

Sigh... http://search... makes finding documentation for Perl
modules very easy. Is there an equivalent for Ruby Gems?

- Mark.

Michael Neumann

7/7/2005 6:48:00 PM

0

Mark Thomas wrote:
>>Michael Neumann's Mechanize lib bundles up this behavior so that you can
>>grab an HTML doc and operate on select sections; you can also grab the
>>resulting REXML document and run arbitrary XPath calls on it too.
>>Search the ruby-talk archives as this was discussed not too long ago.
>
>
> I saw that comment, but wasn't able to find any documentation for
> Mechanize. Sorry if I'm being stupid, but where can I find the
> documentation?

Nowhere, as it's non-existing. And I do not plan to document it, but
I've been told that the www.ruby-web.org project will adopt Mechanize
and maybe they'll document and improve it.

Take a look at the examples.

Regards,

Michael