[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Getting a list of results from one regular expression

tietyt

6/23/2005 2:29:00 AM

Hello I'm new to Ruby. I've read most of the pragmatic programmer
guide but couldn't find anything that explained how to do this.

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

I'm no pro with regex, but I think that will find a match for a string
that looks like this: stringAlias="BLAH"

And because of the (.*), the result will be BLAH

Now this is all fine and good. But what I can't figure out is how to
get every match in an array (instead of just the first match.

If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
array that is ["BLAH", "BLEH"]?

Keep in mind that there are a dynamic number of matches for
stringAlias="(.*)"


This is the code I wrote to try to do it:

def ...
@aliases = []
matchedData = /stringAlias="(.*?)"/.match(@data)
@aliases = matchedData.to_a
puts @aliases
end

The length of the array is 2 and the result is this:
stringAlias="OP"
OP

Even though the data is this:
<string RSLDefined="false" active="false" languageId="1"
sortOrder="0" stringAlias="OP">
<stringValue><![CDATA[Open or Pending]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="1" stringAlias="1">
<stringValue><![CDATA[Open]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="2" stringAlias="2">
<stringValue><![CDATA[Pend]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="3" stringAlias="3">
<stringValue><![CDATA[Decline]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="4" stringAlias="4">
<stringValue><![CDATA[Complete]]></stringValue>
</string>

6 Answers

Devin Mullins

6/23/2005 2:44:00 AM

0

tietyt@gmail.com wrote:

>To summarize my whole question: how do I get EVERY match of a regular
>expression (instead of just the first)?
>
>
String#scan

I'm sure there are other ways, though. I just learned about String#scan
today. (Yes, Dave, my copy of the Pickaxe is on its way.)

Devin



C Erler

6/23/2005 2:46:00 AM

0

I usually use String#scan.

"testwoohootestkaboomtestyutyut".scan(/test../)
=> ["testwo", "testka", "testyu"]

On 22/06/05, tietyt@gmail.com <tietyt@gmail.com> wrote:
> Hello I'm new to Ruby. I've read most of the pragmatic programmer
> guide but couldn't find anything that explained how to do this.
>
> To summarize my whole question: how do I get EVERY match of a regular
> expression (instead of just the first)?
>
> Here's my situation, I've got this long string that contains XML. I
> would like to parse it. Specifically, I want to search this string for
> all instances of a pattern like /stringAlias="(.*)"/
>
> I'm no pro with regex, but I think that will find a match for a string
> that looks like this: stringAlias="BLAH"
>
> And because of the (.*), the result will be BLAH
>
> Now this is all fine and good. But what I can't figure out is how to
> get every match in an array (instead of just the first match.
>
> If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
> array that is ["BLAH", "BLEH"]?
>
> Keep in mind that there are a dynamic number of matches for
> stringAlias="(.*)"
>
> This is the code I wrote to try to do it:
>
> def ...
> @aliases = []
> matchedData = /stringAlias="(.*?)"/.match(@data)
> @aliases = matchedData.to_a
> puts @aliases
> end
>
> The length of the array is 2 and the result is this:
> stringAlias="OP"
> OP
>
> Even though the data is this:
> <string RSLDefined="false" active="false" languageId="1"
> sortOrder="0" stringAlias="OP">
> <stringValue><![CDATA[Open or Pending]]></stringValue>
> </string>
> <string RSLDefined="false" active="true" languageId="1"
> sortOrder="1" stringAlias="1">
> <stringValue><![CDATA[Open]]></stringValue>
> </string>
> <string RSLDefined="false" active="true" languageId="1"
> sortOrder="2" stringAlias="2">
> <stringValue><![CDATA[Pend]]></stringValue>
> </string>
> <string RSLDefined="false" active="true" languageId="1"
> sortOrder="3" stringAlias="3">
> <stringValue><![CDATA[Decline]]></stringValue>
> </string>
> <string RSLDefined="false" active="true" languageId="1"
> sortOrder="4" stringAlias="4">
> <stringValue><![CDATA[Complete]]></stringValue>
> </string>
>
>


Mark Hubbart

6/23/2005 2:47:00 AM

0

On 6/22/05, tietyt@gmail.com <tietyt@gmail.com> wrote:
> Hello I'm new to Ruby. I've read most of the pragmatic programmer
> guide but couldn't find anything that explained how to do this.
>
> To summarize my whole question: how do I get EVERY match of a regular
> expression (instead of just the first)?
>
> Here's my situation, I've got this long string that contains XML. I
> would like to parse it. Specifically, I want to search this string for
> all instances of a pattern like /stringAlias="(.*)"/
>
> I'm no pro with regex, but I think that will find a match for a string
> that looks like this: stringAlias="BLAH"
>
> And because of the (.*), the result will be BLAH
>
> Now this is all fine and good. But what I can't figure out is how to
> get every match in an array (instead of just the first match.
>
> If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
> array that is ["BLAH", "BLEH"]?
>
> Keep in mind that there are a dynamic number of matches for
> stringAlias="(.*)"
>
>
> This is the code I wrote to try to do it:
>
> def ...
> @aliases = []
> matchedData = /stringAlias="(.*?)"/.match(@data)
> @aliases = matchedData.to_a
> puts @aliases
> end
>
> The length of the array is 2 and the result is this:
> stringAlias="OP"
> OP
>
> Even though the data is this:
> <string RSLDefined="false" active="false" languageId="1"
> sortOrder="0" stringAlias="OP">
> <stringValue><![CDATA[Open or Pending]]></stringValue>
> </string>
> <string RSLDefined="false" active="true" languageId="1"
> sortOrder="1" stringAlias="1">
> <stringValue><![CDATA[Open]]></stringValue>
> </string>
> <string RSLDefined="false" active="true" languageId="1"
> sortOrder="2" stringAlias="2">
> <stringValue><![CDATA[Pend]]></stringValue>
> </string>
> <string RSLDefined="false" active="true" languageId="1"
> sortOrder="3" stringAlias="3">
> <stringValue><![CDATA[Decline]]></stringValue>
> </string>
> <string RSLDefined="false" active="true" languageId="1"
> sortOrder="4" stringAlias="4">
> <stringValue><![CDATA[Complete]]></stringValue>
> </string>

Regexp#match only gives the first match; the matchdata object is sort
of an array of the entire match, followed by the subexpression
matches. What you want is String#scan: (warning, untested)

regexp = /stringAlias="(.*?)"/
matches = @data.scan(regexp)

Since the regexp has a subexpression matcher, that is what will be put
into the array "matches". You'll get an array something like this:

[["OP"],["1"],["2"], ... ]

(each match has it's own subarray, since it's a subexpression match)

Check out the docs for String#scan for more info...

cheers,
Mark


Gavin Kistner

6/23/2005 4:36:00 AM

0

On Jun 22, 2005, at 8:30 PM, tietyt@gmail.com wrote:
> To summarize my whole question: how do I get EVERY match of a regular
> expression (instead of just the first)?

In addition to the correct response given by others (String#scan),
you might also want to look at the StringScanner class. It gives you
the ability to crawl through a string with successive regexp calls,
where each new call starts at the new 'current' position.

story = <<ENDSTORY
Hello World! There are 3 cats in my house, with 4 feet each.

6 of those 12 feet have 5 claws each; the other 6 feet have 4 claws
each.

Ow, my back. 54 claws need clipping.
ENDSTORY

require 'strscan'
scanner = StringScanner.new( story )

info = []
count_nouns = /(\d+) (\w+)/

until scanner.eos?
break unless scanner.scan_until( count_nouns )
tidbit = {
:full_match => scanner[0],
:count => scanner[1].to_i,
:noun => scanner[2]
}
info << tidbit
end

require 'pp'
pp info
info.each{ |tidbit|
puts "Of %7s, I saw %02d" % [ tidbit[:noun], tidbit[:count] ]
}



[{:noun=>"cats", :count=>3, :full_match=>"3 cats"},
{:noun=>"feet", :count=>4, :full_match=>"4 feet"},
{:noun=>"of", :count=>6, :full_match=>"6 of"},
{:noun=>"feet", :count=>12, :full_match=>"12 feet"},
{:noun=>"claws", :count=>5, :full_match=>"5 claws"},
{:noun=>"feet", :count=>6, :full_match=>"6 feet"},
{:noun=>"claws", :count=>4, :full_match=>"4 claws"},
{:noun=>"claws", :count=>54, :full_match=>"54 claws"}]
Of cats, I saw 03
Of feet, I saw 04
Of of, I saw 06
Of feet, I saw 12
Of claws, I saw 05
Of feet, I saw 06
Of claws, I saw 04
Of claws, I saw 54



Pit Capitain

6/23/2005 6:32:00 AM

0

tietyt@gmail.com schrieb:
> Here's my situation, I've got this long string that contains XML. I
> would like to parse it. Specifically, I want to search this string for
> all instances of a pattern like /stringAlias="(.*)"/

One additional remark: if the input can contain multiple stringAlias
expressions on one line, the pattern should be /stringAlias="(.*?)"/
(note the question mark). You can see the difference if you match a
string like

str = "stringAlias=\"one\" bla stringAlias=\"two\""

p str.scan( /stringAlias="(.*)"/ )
# => [["one\" bla stringAlias=\"two"]]

p str.scan( /stringAlias="(.*?)"/ )
# => [["one"], ["two"]]

Regards,
Pit


tietyt

6/23/2005 7:15:00 AM

0

First of all, thanks for all that super fast help. I've never asked a
technical question anywhere before and got such a fast response.

Specifically to Pit Capitain:
Thanks for that tip. I just googled that and learned what the .*?
does.

Pit Capitain wrote:
> tietyt@gmail.com schrieb:
> > Here's my situation, I've got this long string that contains XML. I
> > would like to parse it. Specifically, I want to search this string for
> > all instances of a pattern like /stringAlias="(.*)"/
>
> One additional remark: if the input can contain multiple stringAlias
> expressions on one line, the pattern should be /stringAlias="(.*?)"/
> (note the question mark). You can see the difference if you match a
> string like
>
> str = "stringAlias=\"one\" bla stringAlias=\"two\""
>
> p str.scan( /stringAlias="(.*)"/ )
> # => [["one\" bla stringAlias=\"two"]]
>
> p str.scan( /stringAlias="(.*?)"/ )
> # => [["one"], ["two"]]
>
> Regards,
> Pit