[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Splitting strings on spaces, unless inside quotes

Richard Livsey

1/7/2006 12:09:00 AM

I want to split a string into words, but group quoted words together
such that...

some words "some quoted text" some more words

would get split up into:

["some", "words", "some quoted text", "some", "more", "words"]

So far I'm drawing a blank on the 'Ruby way' to do this and the only
solutions I can think of are turning out to be fairly ugly.

Any advice would be great. Thanks in advance.

--
R.Livsey
http://...



16 Answers

ES

1/7/2006 1:00:00 AM

0

On 2006.01.07 09:08, Richard Livsey wrote:
> I want to split a string into words, but group quoted words together
> such that...
>
> some words "some quoted text" some more words
>
> would get split up into:
>
> ["some", "words", "some quoted text", "some", "more", "words"]
>
> So far I'm drawing a blank on the 'Ruby way' to do this and the only
> solutions I can think of are turning out to be fairly ugly.
>
> Any advice would be great. Thanks in advance.

Naively, you can try something like this:

s = 'foo bar "baz quux" roo'
s.scan(/(?:"")|(?:"(.*[^\\])")|(\w+)/).flatten.compact

Elaborate as necessary (add support for single quotes or something).

> R.Livsey


E


Tim Heaney

1/7/2006 1:08:00 AM

0

Richard Livsey <richard@livsey.org> writes:

> I want to split a string into words, but group quoted words together
> such that...
>
> some words "some quoted text" some more words
>
> would get split up into:
>
> ["some", "words", "some quoted text", "some", "more", "words"]

How about the csv module? Despite the name, you don't have to use
commas.

require 'csv'
CSV::parse_line('some words "some quoted text" some more words', ' ')

I hope this helps,

Tim

James Gray

1/7/2006 1:53:00 AM

0

On Jan 6, 2006, at 6:08 PM, Richard Livsey wrote:

> I want to split a string into words, but group quoted words
> together such that...
>
> some words "some quoted text" some more words
>
> would get split up into:
>
> ["some", "words", "some quoted text", "some", "more", "words"]
>
> So far I'm drawing a blank on the 'Ruby way' to do this and the
> only solutions I can think of are turning out to be fairly ugly.
>
> Any advice would be great. Thanks in advance.

I agree that CSV is the way to go, but here's a direct attempt:

>> example = %Q{some words "some quoted text" some more words}
=> "some words \"some quoted text\" some more words"
>> example.scan(/\s+|\w+|"[^"]*"/).
?> reject { |token| token =~ /^\s+$/ }.
?> map { |token| token.sub(/^"/, "").sub(/"$/, "") }
=> ["some", "words", "some quoted text", "some", "more", "words"]

Hope that gives you some fresh ideas.

James Edward Gray II


matthew.moss.coder

1/7/2006 1:55:00 AM

0

> some words "some quoted text" some more words
>
> would get split up into:
>
> ["some", "words", "some quoted text", "some", "more", "words"]

s = 'some words "some quoted text" some more words

sa = s.split(/"/).collect { |x| x.strip }
(0...sa.size).to_a.zip(sa).collect { |i,x| (i&1).zero? ? x.split : x }.flatten


Michael 'entropie' Trommer

1/7/2006 1:56:00 AM

0

* James Edward Gray II (james@grayproductions.net) wrote:
> >> example = %Q{some words "some quoted text" some more words}
> => "some words \"some quoted text\" some more words"
> >> example.scan(/\s+|\w+|"[^"]*"/).
> ?> reject { |token| token =~ /^\s+$/ }.
> ?> map { |token| token.sub(/^"/, "").sub(/"$/, "") }
> => ["some", "words", "some quoted text", "some", "more", "words"]

impressive


So long
--
Michael 'entropie' Trommer; http:/...

ruby -e "0.upto((a='njduspAhnbjm/dpn').size-1){|x| a[x]-=1}; p 'mailto:'+a"

matthew.moss.coder

1/7/2006 2:01:00 AM

0

> (0...sa.size).to_a.zip(sa).collect { |i,x| (i&1).zero? ? x.split : x }.flatten

Just realized that Range responds to zip, so the to_a is unnecessary.

This looks slightly cleaner to me:

(1..sa.size).zip(sa).collect { |i,x| (i&1).zero? ? x : x.split }.flatten


Xavier Noria

1/7/2006 2:18:00 AM

0

On Jan 7, 2006, at 1:08, Richard Livsey wrote:

> I want to split a string into words, but group quoted words
> together such that...
>
> some words "some quoted text" some more words
>
> would get split up into:
>
> ["some", "words", "some quoted text", "some", "more", "words"]

Curiously, someone asked exactly that on freenode#perl tonight.

If the input is that simple and is assumed to be well-formed this is
enough:

irb(main):005:0> %q{some words "some quoted text" some "" more
words}.scan(/"[^"]*"|\S+/)
=> ["some", "words", "\"some quoted text\"", "some", "\"\"", "more",
"words"]

Since nothing was said about this, it does not handle escaped quotes,
and it assumes quotes are always balanced, so a field cannot be %q
{"foo}, for example.

-- fxn


dblack

1/7/2006 2:34:00 AM

0

James Gray

1/7/2006 2:45:00 AM

0

On Jan 6, 2006, at 8:33 PM, dblack@wobblini.net wrote:

>>>> example = %Q{some words "some quoted text" some more words}
>> => "some words \"some quoted text\" some more words"
>>>> example.scan(/\s+|\w+|"[^"]*"/).
>> ?> reject { |token| token =~ /^\s+$/ }.
>> ?> map { |token| token.sub(/^"/, "").sub(/"$/, "") }
>> => ["some", "words", "some quoted text", "some", "more", "words"]
>
> I think you could do less work:
>
> example.scan(/"[^"]+"|\S+/).map { |word| word.delete('"') }
>
> (Or am I overlooking some reason you'd want to capture sequences of
> spaces?)
>
> I changed the \w+ to \S+ (and moved it after the | to avoid having it
> sponge up too much) in case the words included non-\w characters.

You're right, that's better all around.

> I guess with zero-width positive lookbehind/ahead one could do it
> without the map operation.

You can drop the map(), if you're willing to replace it with two
other calls:

>> example = %Q{some words "some quoted text" some more words}
=> "some words \"some quoted text\" some more words"
>> example.scan(/"([^"]+)"|(\S+)/).flatten.compact
=> ["some", "words", "some quoted text", "some", "more", "words"]

James Edward Gray II



Florian Groß

1/7/2006 4:01:00 AM

0