[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Suggestions for a parsing strategy?

Robb

7/19/2008 3:04:00 AM

Hi all,

I have input strings that can look like this:

Common, Commerc(e, ial)

I need to parse these into the three words that this represents:

Common, Commerce, Commercial.

I'm a little new to ruby, and hence wondering what direction would be
best to go in? (.scan, regexes ... something else?) For me, the
complication I'm not sure how to deal with is the two "levels" of the
comma as a separator.

Thanks,
Robb
6 Answers

David Masover

7/19/2008 3:08:00 AM

0

On Friday 18 July 2008 21:59:56 Robb wrote:
> Hi all,
>
> I have input strings that can look like this:
>
> Common, Commerc(e, ial)
>
> I need to parse these into the three words that this represents:
>
> Common, Commerce, Commercial.
>
> I'm a little new to ruby, and hence wondering what direction would be
> best to go in? (.scan, regexes ... something else?) For me, the
> complication I'm not sure how to deal with is the two "levels" of the
> comma as a separator.

One way would be to find the exceptions first. Replace anything that matches
the

Commerc(e, ial)

pattern with the two words, as the literal string "Commerce, Commercial". Then
you can just do a simple split on the commas, and maybe strip whitespace.

Eric I.

7/19/2008 3:39:00 AM

0

On Fri, Jul 18, 2008 at 10:59 PM, Robb <Robb.Shecter@gmail.com> wrote:
> Hi all,
>
> I have input strings that can look like this:
>
> Common, Commerc(e, ial)
>
> I need to parse these into the three words that this represents:
>
> Common, Commerce, Commercial.

This code does a lot of what you describe, providing the parenthetical
only appears at the end.

====

s = "Common, Commerc(e, ial), Computer, Con(ic, ehead, temporary)"

def parse_word_list(s)
s.scan(/(\w+)(\((.*?)\))?/).map { |root, junk, suffixes|
[root, suffixes && suffixes.split(", ")]
}
end

list = parse_word_list(s)

# see what's produced
p list

# use it to generate all words
list.each do |root, suffix_list|
if suffix_list
suffix_list.each do |suffix|
puts "#{root}#{suffix}"
end
else
puts root
end
end

====

Hope that helps,

Eric

====

LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE workshops.
Please visit http://Lea... for all the details.

David Masover

7/19/2008 4:20:00 AM

0

On Friday 18 July 2008 22:38:43 Eric I. wrote:

> s.scan(/(\w+)(\((.*?)\))?/).map { |root, junk, suffixes|

This pattern looks really useful... Looking at the docs for scan, it looks
like it can take a block.

Which just leaves one question: Why isn't this an Enumerator in Ruby 1.9? I
don't think the original meaning (of producing an array) is made much more
difficult by the form

s.scan(/.../).to_a

And I suspect that it would most often be useful for things like #map, if not
used in block form outright. Making it an Enumerator would be somewhat more
efficient than building a whole array first -- and more responsive, if it's a
large string.

Sebastian Hungerecker

7/19/2008 11:02:00 AM

0

David Masover wrote:
> Why isn't [the return value of scan] an Enumerator in Ruby 1.9?

Or 1.8.7 for that matter. Yes, I've been asking myself this very same question
since the release of 1.9.


> And I suspect that it would most often be useful for things like #map, if
> not used in block form outright. Making it an Enumerator would be somewhat
> more efficient than building a whole array first -- and more responsive, if
> it's a large string.

Also it'd allow you to use the matchdata object inside map if you need to. The
way it is now you'd have to do:
string.enum_for(:scan,/re/).map do
md = Regexp.last_match
do_something_with md
end

instead of just
string.scan.map do...end


--
Jabber: sepp2k@jabber.org
ICQ: 205544826

Tachikoma

7/19/2008 11:44:00 AM

0

On Jul 19, 11:38 am, "Eric I." <rubytrain...@gmail.com> wrote:
>   s.scan(/(\w+)(\((.*?)\))?/).map { |root, junk, suffixes|

s.scan(/(\w+)(?:\((.*?)\))?/) can avoid the "junk"
^^,Your pattern is great & helpful

ThoML

7/19/2008 12:05:00 PM

0

> Making it an Enumerator would be somewhat more
> efficient than building a whole array first -- and more responsive, if it's a
> large string.

There is also the StringScanner class that can be used to return one
match at a time.