[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

New at regexp and Ruby need help on parsing a string.

Gabra Kadabra

11/23/2007 8:28:00 AM

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.

Please help me
--
Posted via http://www.ruby-....

9 Answers

7stud --

11/23/2007 10:35:00 AM

0

Gabra Kadabra wrote:
> I'm building a little test console for a ruby project. When using a
> function I might get something like this:
>
> input_string ="and stuff and nice things not bad girls not greasy boys
> and girlsandboys"
>
> As you already have guessed, I want the following in some kind of
> format:
>
> smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
> "not" => ["bad girls","greasy boys"]}
>
> Thus, a regexp that splits a string on code words like "and" and "not"
> is what I need.
>
> Please help me

Try this:

str = 'and stuff and nice things not bad girls not greasy boys
and girlsandboys'

smoking_table = {'and'=>[], 'not'=>[]}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

case pieces[index]
when 'and '
smoking_table['and'] << pieces[index+1].strip
index +=2
when 'not '
smoking_table['not'] << pieces[index+1].strip
index += 2
else
index += 1
end

end

p smoking_table


--
Posted via http://www.ruby-....

Raul Raul

11/23/2007 10:48:00 AM

0

Gabra Kadabra wrote:
> I'm building a little test console for a ruby project. When using a
> function I might get something like this:
>
> input_string ="and stuff and nice things not bad girls not greasy boys
> and girlsandboys"
>
> As you already have guessed, I want the following in some kind of
> format:
>
> smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
> "not" => ["bad girls","greasy boys"]}
>
> Thus, a regexp that splits a string on code words like "and" and "not"
> is what I need.
> Please help me

# One possible implementation is:

smoking_table = { :and => [], :not => [] }

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

=> {:and => ["stuff", "nice things", "girlsandboys"],
:not => ["bad girls", "greasy boys"]}


I hope that this works for you,

Raul
--
Posted via http://www.ruby-....

7stud --

11/23/2007 10:50:00 AM

0

7stud -- wrote:
>
> Try this:
>
> str = 'and stuff and nice things not bad girls not greasy boys
> and girlsandboys'
>
> smoking_table = {'and'=>[], 'not'=>[]}
>
> pieces = str.split(/(and |not )/)
> len = pieces.length
>
> index = 0
> while index < len
>
> case pieces[index]
> when 'and '
> smoking_table['and'] << pieces[index+1].strip
> index +=2
> when 'not '
> smoking_table['not'] << pieces[index+1].strip
> index += 2
> else
> index += 1
> end
>
> end
>
> p smoking_table

Normally when you split() a string, you do something like this:

str = 'aXbXc'
pieces = str.split('X')
p pieces
-->["a", "b", "c"]

Notice that the pattern you use to split the string is not part of the
results-it's chopped out of the string and the pieces are what's left
over. However, there is a little known feature where if your split
pattern has a group in it, which is formed by putting parenthesis around
part of the patten, then the group will be returned in the results. I
used parentheses around the whole split pattern to get a result array
like this:

["", "and ", "stuff ", "and ", "nice things ", "not ", "bad girls ",
"not ", "greasy boys\n", "and ", "girlsandboys"]

By including the split pattern in the results, you can see that each
piece of the string is preceded by either 'and ' or 'not '. The 'and '
or 'not ' then serves as an identifier for each piece of the string.

--
Posted via http://www.ruby-....

Gabra Kadabra

11/23/2007 1:10:00 PM

0

Raul Parolari wrote:
>
> str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
> smoking_table[k.to_sym].push(v.strip)
> end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Thanks.
--
Posted via http://www.ruby-....

Peter Vanderhaden

11/23/2007 3:29:00 PM

0

Raul,
Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out like:

andstuffnice thingsgirlsboysnotbad girlsgreasy boys

I used puts smoking_table. I'm assuming that's not the correct way to
do it.
Thanks,
PV

Raul Parolari wrote:
> Gabra Kadabra wrote:
>> I'm building a little test console for a ruby project. When using a
>> function I might get something like this:
>>
>> input_string ="and stuff and nice things not bad girls not greasy boys
>> and girlsandboys"
>>
>> As you already have guessed, I want the following in some kind of
>> format:
>>
>> smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
>> "not" => ["bad girls","greasy boys"]}
>>
>> Thus, a regexp that splits a string on code words like "and" and "not"
>> is what I need.
>> Please help me
>
> # One possible implementation is:
>
> smoking_table = { :and => [], :not => [] }
>
> str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
> smoking_table[k.to_sym].push(v.strip)
> end
>
> => {:and => ["stuff", "nice things", "girlsandboys"],
> :not => ["bad girls", "greasy boys"]}
>
>
> I hope that this works for you,
>
> Raul

--
Posted via http://www.ruby-....

7stud --

11/23/2007 6:09:00 PM

0

Gabra Kadabra wrote:
> Raul Parolari wrote:
>>
>> str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
>> smoking_table[k.to_sym].push(v.strip)
>> end
>
> I think Raul just convinced me that I really need to start a deep
> relationship with regexp.
> This is magic in one row, readable in three.
>

Don't be fooled by one liners. Ruby syntax allows you to string
multiple method calls together in a compact way--yet the result can be
very inefficient. Whenever I see a one liner with multiple method calls
strung together and regex's sprinkled in for good measure, I immediately
assume there is a more efficient solution. The solution I posted is a
case in point: even though it has five times the number of lines, it is
70% faster on my system than the one liner you find so alluring.

In addition, I find one liners hard to decipher, and since I don't
aspire to write hard to read code that is also inefficient, I rarely try
to cram a whole program into a single line.

>>Peter Vanderhaden wrote:
>>
>> I used puts smoking_table. I'm assuming that's not the correct
>> way to do it.

Use the p command instead of puts to get the nice dictionary format.

--
Posted via http://www.ruby-....

Raul Raul

11/23/2007 6:23:00 PM

0


Gabra Kadabra wrote:
> Raul Parolari wrote:
>>
>> str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
>> smoking_table[k.to_sym].push(v.strip)
>> end
>
> I think Raul just convinced me that I really need to start a deep
> relationship with regexp.
> This is magic in one row, readable in three.
>
> Thanks.

I totally agree with you; this is not a subject that you learn 'just
trying' or even reading the forum. Start with calm from the basics with
a good book, and soon those funny hieroglyphics will become your
friends.

By the way, the code above did not deal with 'notorious bad girls' (I
mean words beginning with 'not'); I had only checked for an absence of
prefix, not of suffix. So, here it is (the '\b' before and after a word
makes sure that it is indeed a 'word'):

str = "and stuff and nice things not notorious bad girls not greasy boys
and girslsandboys"

h = { :and => [], :not => [] }

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
h[k.to_sym].push(v.strip) }
end

p h # => {:and=>["stuff", "nice things", "girslsandboys"],
# :not=>["notorious bad girls", "greasy boys"]}


Peter Vanderhaden wrote
> Interesting solution. One question, how did you print the output? I'm
> a newbie and the output I got when I tried your solution came out ..

By default, the puts/print methods for hashes concatenate keys and
values; you can use 'p' (or 'puts inspect') to see the hash. If you are
in irb, just writing the name of the hash will show it to you.

Regards
Raul

--
Posted via http://www.ruby-....

Raul Raul

11/23/2007 6:36:00 PM

0


When I typed the final solution, an unwanted '}' got in. I post again
the code:

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
h[k.to_sym].push(v.strip)
end

Regards
Raul

--
Posted via http://www.ruby-....

RichardOnRails

11/24/2007 9:30:00 PM

0

On Nov 23, 10:29 am, Peter Vanderhaden <bostonanti...@yahoo.com>
wrote:
> Raul,
> Interesting solution. One question, how did you print the output? I'm
> a newbie and the output I got when I tried your solution came out like:
>
> andstuffnice thingsgirlsboysnotbad girlsgreasy boys
>
> I used puts smoking_table. I'm assuming that's not the correct way to
> do it.
> Thanks,
> PV
>
>
>
>
>
> Raul Parolari wrote:
> > Gabra Kadabra wrote:
> >> I'm building a little test console for a ruby project. When using a
> >> function I might get something like this:
>
> >> input_string ="and stuff and nice things not bad girls not greasy boys
> >> and girlsandboys"
>
> >> As you already have guessed, I want the following in some kind of
> >> format:
>
> >> smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
> >> "not" => ["bad girls","greasy boys"]}
>
> >> Thus, a regexp that splits a string on code words like "and" and "not"
> >> is what I need.
> >> Please help me
>
> > # One possible implementation is:
>
> > smoking_table = { :and => [], :not => [] }
>
> > str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
> > smoking_table[k.to_sym].push(v.strip)
> > end
>
> > => {:and => ["stuff", "nice things", "girlsandboys"],
> > :not => ["bad girls", "greasy boys"]}
>
> > I hope that this works for you,
>
> > Raul
>
> --
> Posted viahttp://www.ruby-fo... Hide quoted text -
>
> - Show quoted text -

p smoking_table

(Same as stud's example).

HTH,
Richard