[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Regexp question

Mark Probert

9/30/2004 9:11:00 PM


Hi, Rubyists.

What is the best way of attacking field split on ';' when the string looks
like:

s = 'a;b;c\;;d;'
s.split(/???;/)
=> ["a", "b", "c\;", "d"]

Or is it best to use s.each_byte and do it by hand?

--
-mark. (probertm @ acm dot org)

10 Answers

Simon Strandgaard

9/30/2004 9:29:00 PM

0

On Thursday 30 September 2004 23:15, Mark Probert wrote:
> Hi, Rubyists.
>
> What is the best way of attacking field split on ';' when the string looks
> like:
>
> s = 'a;b;c\;;d;'
> s.split(/???;/)
> => ["a", "b", "c\;", "d"]
>
> Or is it best to use s.each_byte and do it by hand?

How about something ala

irb(main):015:0> "aa;bbb\\;;abc;;d\\\\;e;".scan(/(?:\\[^.]|[^;])*;/)
=> ["aa;", "bbb\\;;", "abc;", ";", "d\\\\;", "e;"]

--
Simon Strandgaard


Brian Schröder

9/30/2004 9:34:00 PM

0

Mark Probert wrote:
> Hi, Rubyists.
>
> What is the best way of attacking field split on ';' when the string looks
> like:
>
> s = 'a;b;c\;;d;'
> s.split(/???;/)
> => ["a", "b", "c\;", "d"]
>
> Or is it best to use s.each_byte and do it by hand?
>

Normally this would call for fixed width lookbehind,

/(?<!\\);/

but as far as I know its not included in the ruby regexp engine.

But for further clarification:
How should 'a;b\\;;c' be split?
If backslashs can be escaped (and you'd want that because otherwise you
can't have a field "b\" its more difficult.

And maybe the CSV library can help you here.

regards,

Brian

--
Brian Schröder
http://ruby.brian-sch...


Simon Strandgaard

9/30/2004 9:43:00 PM

0

On Thursday 30 September 2004 23:29, Simon Strandgaard wrote:
> On Thursday 30 September 2004 23:15, Mark Probert wrote:
> > Hi, Rubyists.
> >
> > What is the best way of attacking field split on ';' when the string
> > looks like:
> >
> > s = 'a;b;c\;;d;'
> > s.split(/???;/)
> > => ["a", "b", "c\;", "d"]
> >
> > Or is it best to use s.each_byte and do it by hand?
>
> How about something ala
>
> irb(main):015:0> "aa;bbb\\;;abc;;d\\\\;e;".scan(/(?:\\[^.]|[^;])*;/)
> => ["aa;", "bbb\\;;", "abc;", ";", "d\\\\;", "e;"]


maybe this one is better ?

irb(main):001:0> "aa;bbb\\;;abc;;d\\\\;e;f".scan(/(?:\A|;)((?:\\[^.]|[^;])*)/)
{ p $1 }
"aa"
"bbb\\;"
"abc"
""
"d\\\\"
"e"
"f"
=> "aa;bbb\\;;abc;;d\\\\;e;f"
irb(main):002:0>

--
Simon Strandgaard


Mark Probert

9/30/2004 9:47:00 PM

0

Hi ..

Simon Strandgaard <neoneye@adslhome.dk> wrote:
>
> How about something ala
>
> irb(main):015:0> "aa;bbb\\;;abc;;d\\\\;e;".scan(/(?:\\[^.]|[^;])*;/)
> => ["aa;", "bbb\\;;", "abc;", ";", "d\\\\;", "e;"]
>

Thanks! That is close enough:

irb(main):019:0> s.scan(/(?:\\[^.]|[^;])*/).each do |it|
irb(main):020:1* next if it.empty?
irb(main):021:1> puts " --> #{it}"
irb(main):022:1> end
--> a is a word
--> b is too
--> c\; for fun
--> d -- forget it
=> ["a is a word", "", "b is too", "", "c\\; for fun", "", "d -- forget
it", "", ""]



--
-mark. (probertm @ acm dot org)

Dany Cayouette

9/30/2004 9:57:00 PM

0


> But for further clarification:
> How should 'a;b\\;;c' be split?
Guess is that it should be
["a", "b\", nil, "c"]

characters escaped by backslash at semi-colon, colon and backslash i.e.

; => \; : => \: \ => \
> If backslashs can be escaped (and you'd want that because otherwise you
> can't have a field "b\" its more difficult.
>
> And maybe the CSV library can help you here.

thanks,
Dany

Dany Cayouette

9/30/2004 10:11:00 PM

0

On Thu, 30 Sep 2004 17:57:19 -0400
Dany Cayouette <danyc@nortelnetworks.com> wrote:

>
> > But for further clarification:
> > How should 'a;b\\;;c' be split?
> Guess is that it should be
> ["a", "b\", nil, "c"]
Sorry... I meant
["a", "b\\", nil, "c"] where b\\ would utimately become b\ when the escape chars are process in the data portion
>
> characters escaped by backslash at semi-colon, colon and backslash i.e.
>
> ; => \; : => \: \ => \>
> > If backslashs can be escaped (and you'd want that because otherwise you
> > can't have a field "b\" its more difficult.
> >
Didn't think about that one... I thought this was simple and the problem was my lack of programming experience...

Dany

Florian Gross

9/30/2004 11:09:00 PM

0

Mark Probert wrote:

> Hi, Rubyists.

Moin!

> What is the best way of attacking field split on ';' when the string looks
> like:
>
> s = 'a;b;c\;;d;'
> s.split(/???;/)
> => ["a", "b", "c\;", "d"]
>
> Or is it best to use s.each_byte and do it by hand?

This works, (even with escaped escape characters) but you might be
better off doing it by hand to keep complexity low:

> irb(main):025:0> str = "hello;world;foo\\;bar;no escape\\\\;blar"; puts str
> hello;world;foo\;bar;no escape\\;blar
> => nil
> irb(main):026:0> str.scan(/(?:(?!\\).(?:\\{2})*\\;|[^;])+/).map { |str| str.gsub(/\\(.)/, '\1') }
> => ["hello", "world", "foo;bar", "no escape\\", "blar"]

Regards,
Florian Gross

Robert Klemme

10/1/2004 7:45:00 AM

0


"Mark Probert" <probertm@nospam-acm.org> schrieb im Newsbeitrag
news:Xns95749654816D0probertmnospamtelusn@198.161.157.145...
> Hi ..
>
> Simon Strandgaard <neoneye@adslhome.dk> wrote:
> >
> > How about something ala
> >
> > irb(main):015:0> "aa;bbb\\;;abc;;d\\\\;e;".scan(/(?:\\[^.]|[^;])*;/)
> > => ["aa;", "bbb\\;;", "abc;", ";", "d\\\\;", "e;"]
> >
>
> Thanks! That is close enough:
>
> irb(main):019:0> s.scan(/(?:\\[^.]|[^;])*/).each do |it|
> irb(main):020:1* next if it.empty?
> irb(main):021:1> puts " --> #{it}"
> irb(main):022:1> end
> --> a is a word
> --> b is too
> --> c\; for fun
> --> d -- forget it
> => ["a is a word", "", "b is too", "", "c\\; for fun", "", "d -- forget
> it", "", ""]


>> s = "aa;bbb\\;;abc;;d\\\\;e;"
=> "aa;bbb\\;;abc;;d\\\\;e;"
>> s.scan /(?:\\.|[^\\;])+/
=> ["aa", "bbb\\;", "abc", "d\\\\", "e"]

Regards

robert

Simon Strandgaard

10/1/2004 4:33:00 PM

0

On Friday 01 October 2004 09:45, Robert Klemme wrote:
[snip]
> >> s = "aa;bbb\\;;abc;;d\\\\;e;"
> => "aa;bbb\\;;abc;;d\\\\;e;"
> >> s.scan /(?:\\.|[^\\;])+/
> => ["aa", "bbb\\;", "abc", "d\\\\", "e"]


If its a csv file.. shouldn't output then be?

["aa", "bbb\\;", "abc", "", "d\\\\", "e", ""]

--
Simon Strandgaard



Robert Klemme

10/1/2004 9:42:00 PM

0


"Simon Strandgaard" <neoneye@adslhome.dk> schrieb im Newsbeitrag
news:200410012022.59526.neoneye@adslhome.dk...
> On Friday 01 October 2004 09:45, Robert Klemme wrote:
> [snip]
>> >> s = "aa;bbb\\;;abc;;d\\\\;e;"
>> => "aa;bbb\\;;abc;;d\\\\;e;"
>> >> s.scan /(?:\\.|[^\\;])+/
>> => ["aa", "bbb\\;", "abc", "d\\\\", "e"]
>
>
> If its a csv file.. shouldn't output then be?
>
> ["aa", "bbb\\;", "abc", "", "d\\\\", "e", ""]

Darn! You're right. Unfortunately using "*" instead of "+" is not
sufficient: far too many empty strings are found that way.

robert