[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

regular expressions help

Vivek

7/12/2008 6:20:00 PM

Hi,
How do I split the below string into words..Words can be either a
consecutive set of non whitespace characters or anything withn " "

'hi hello "hello world" hey yo'

should return
[hi, hello, hello world,hey,yo]


I tried to somehow do a collect , but not sure if there is a way to
retain a variable in between 2 invocations and then concat them and
return as one string..
Ofcourse if there is a smart way to do it in one shot using a regex
then i can do a scan on the string

80 Answers

Phlip

7/12/2008 7:16:00 PM

0

> 'hi hello "hello world" hey yo'
>
> should return
> [hi, hello, hello world,hey,yo]

'hi hello "hello world" hey yo'.scan(/\w+/)

=> ["hi", "hello", "hello", "world", "hey", "yo"]

Sorry I couldn't find a more verbose way. Maybe there is one!

Axel

7/12/2008 8:13:00 PM

0

If you can't find anything better, you might want to try:

str = 'hi hello "hello world" hey yo'
str.gsub!( / \" [^\"]* \" /x ) {|e| e[1..-2].gsub(' ', "\007") }
result = str.scan( / [\w\007]+ /x ).map {|e| e.gsub("\007", " ") }
p result

Regards,
Axel

Phlip

7/12/2008 8:53:00 PM

0

Axel wrote:

> str = 'hi hello "hello world" hey yo'
> str.gsub!( / \" [^\"]* \" /x ) {|e| e[1..-2].gsub(' ', "\007") }
> result = str.scan( / [\w\007]+ /x ).map {|e| e.gsub("\007", " ") }
> p result

str = 'hi hello "hello world" hey yo'
p str.scan(/(".*")|(\w+)/).flatten.compact

=> ["hi", "hello", "hello world", "hey", "yo"]

Greedy matching to the rescue!

Phlip

7/12/2008 8:57:00 PM

0

> str = 'hi hello "hello world" hey yo'
> p str.scan(/(".*")|(\w+)/).flatten.compact
>
> => ["hi", "hello", "hello world", "hey", "yo"]
>
> Greedy matching to the rescue!

Also, non-capturing groups help us remove the .flatten.compact nonsense:

p str.scan(/(?:".*")|(?:\w+)/)

=> ["hi", "hello", "\"hello world\"", "hey", "yo"]

I'm not sure why one version capture the "" marks and the other did not...

David A. Black

7/12/2008 9:02:00 PM

0

Hi --

On Sun, 13 Jul 2008, phlip wrote:

> Axel wrote:
>
>> str = 'hi hello "hello world" hey yo'
>> str.gsub!( / \" [^\"]* \" /x ) {|e| e[1..-2].gsub(' ', "\007") }
>> result = str.scan( / [\w\007]+ /x ).map {|e| e.gsub("\007", " ") }
>> p result
>
> str = 'hi hello "hello world" hey yo'
> p str.scan(/(".*")|(\w+)/).flatten.compact
>
> => ["hi", "hello", "hello world", "hey", "yo"]

That's not quite the result, though:

>> str = 'hi hello "hello world" hey yo'
=> "hi hello \"hello world\" hey yo"
>> str.scan(/(".*")|(\w+)/).flatten.compact
=> ["hi", "hello", "\"hello world\"", "hey", "yo"]

The "'s are returned as part of the string '"hello world"'. Also, you
get the wrong result if you have two quoted strings in a row, because
of the greediness:

>> str = 'one "two" "three" four'
=> "one \"two\" \"three\" four"
>> str.scan(/(".*")|(\w+)/).flatten.compact
=> ["one", "\"two\" \"three\"", "four"] # only three strings

Try this:

str.scan(/"([^"]+)"|(\w+)/).flatten.compact

Of course this assumes no embedded/escaped/nested "'s, etc.


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails July 21-24 Edison, NJ
Advancing With Rails August 18-21 Edison, NJ
See http://www.r... for details and updates!

David A. Black

7/12/2008 9:05:00 PM

0

Hi --

On Sun, 13 Jul 2008, phlip wrote:

>> str = 'hi hello "hello world" hey yo'
>> p str.scan(/(".*")|(\w+)/).flatten.compact
>>
>> => ["hi", "hello", "hello world", "hey", "yo"]
>>
>> Greedy matching to the rescue!
>
> Also, non-capturing groups help us remove the .flatten.compact nonsense:
>
> p str.scan(/(?:".*")|(?:\w+)/)
>
> => ["hi", "hello", "\"hello world\"", "hey", "yo"]
>
> I'm not sure why one version capture the "" marks and the other did not...

They both did :-) (See my previous post.)


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails July 21-24 Edison, NJ
Advancing With Rails August 18-21 Edison, NJ
See http://www.r... for details and updates!

Bill Kelly

7/12/2008 9:18:00 PM

0


From: "phlip" <phlip2005@gmail.com>
>
> p str.scan(/(?:".*")|(?:\w+)/)
>
> => ["hi", "hello", "\"hello world\"", "hey", "yo"]

Probably want:

str.scan(/(?:"[^"]*")|(?:\w+)/)

...else the greediness will extend over multiple quoted
strings...

'hi hello "hello world" hey yo "marmoset knocked you out" foo bar'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vs.

'hi hello "hello world" hey yo "marmoset knocked you out" foo bar'
^^^^^^^^^^^^^


> I'm not sure why one version capture the "" marks and the
> other did not...

Strange... They both did, on my system...(?)

BTW, in ruby 1.9, we have lookbehind, so we can avoid picking
up the quotes, with:

str.scan(/(?:(?<=")[^"]*(?="))|(?:\w+)/)


Regards,

Bill



Phlip

7/12/2008 9:22:00 PM

0

David A. Black wrote:

>> => ["hi", "hello", "hello world", "hey", "yo"]
>
> That's not quite the result, though:

I suspect I copied the wrong line from my transcript!

But...

> The "'s are returned as part of the string '"hello world"'. Also, you
> get the wrong result if you have two quoted strings in a row, because
> of the greediness:

str = 'hi hello "hello world" "hey yo"'
p str.scan(/(?:".*")|(?:\w+)/)

=> ["hi", "hello", "\"hello world\" \"hey yo\""] # bad

p str.scan(/(?:".*?")|(?:\w+)/)

=> ["hi", "hello", "\"hello world\"", "\"hey yo\""] # good!

(-:

> str.scan(/"([^"]+)"|(\w+)/).flatten.compact

The non-greedy matcher .*? looks cuter.

> Of course this assumes no embedded/escaped/nested "'s, etc.

Using regexps as real language parsers makes certain baby deities cry...

--
Phlip

Dave Bass

7/12/2008 9:28:00 PM

0

phlip wrote:
>> 'hi hello "hello world" hey yo'
>>
>> should return
>> [hi, hello, hello world,hey,yo]
>
> 'hi hello "hello world" hey yo'.scan(/\w+/)
>
> => ["hi", "hello", "hello", "world", "hey", "yo"]

But this returns "hello world" as two entries, not one as required.
--
Posted via http://www.ruby-....

David A. Black

7/12/2008 9:32:00 PM

0

Hi --

On Sun, 13 Jul 2008, phlip wrote:

> David A. Black wrote:
>
>>> => ["hi", "hello", "hello world", "hey", "yo"]
>>
>> That's not quite the result, though:
>
> I suspect I copied the wrong line from my transcript!
>
> But...
>
>> The "'s are returned as part of the string '"hello world"'. Also, you
>> get the wrong result if you have two quoted strings in a row, because
>> of the greediness:
>
> str = 'hi hello "hello world" "hey yo"'
> p str.scan(/(?:".*")|(?:\w+)/)
>
> => ["hi", "hello", "\"hello world\" \"hey yo\""] # bad
>
> p str.scan(/(?:".*?")|(?:\w+)/)
>
> => ["hi", "hello", "\"hello world\"", "\"hey yo\""] # good!

I don't think the OP wanted the literal quotation marks as part of the
results, though. In other words you'd want the third string to be:

hello world

rather than

"hello world"


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails July 21-24 Edison, NJ
Advancing With Rails August 18-21 Edison, NJ
See http://www.r... for details and updates!