Asp Forum - Break apart a string by kind of characters

Daniel Waite

9/27/2007 3:55:00 PM

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

I did a search on the forums and came up with this regex:

'a1000aa'.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ 'a', '1', '000', 'aa' ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?
--
Posted via http://www.ruby-....

9 Answers

Daniel Waite

9/27/2007 4:31:00 PM

Daniel Waite wrote:
> Hi all, I've an interesting problem. Imagine the following string:
>
> 'a1000aa'
>
> I want to break it apart like so:
>
> [ 'a', '1000', 'aa' ]
>
> I did a search on the forums and came up with this regex:
>
> 'a1000aa'.scan(/((.)\2*)/).map { |i| i[0] }
>
> Which is pretty close, but it groups on a change of character, so I
> would get:
>
> [ 'a', '1', '000', 'aa' ]
>
> I tried playing around with the regex (e.g. swapping the . for (\d|\w))
> but to no avail.
>
> Any ideas?

I figured out one possible solution. Granted, it's not as elegant as a
single regex, but it works and I understand it. Here goes...

First, I opened up class String to add some convenience and make things
a bit shorter:

class String

def letter?
self.first.scan(/[A-Za-z]/).empty? ? false : true
end

def digit?
self.first.scan(/[0123456789]/).empty? ? false : true
end

end

Any my method:

def break_apart_rule_increment
groups = Array.new
string = 'a1000aa'

string.each_char do |character|
# Put the first character into a group.
groups << character and next if groups.empty?

# If this character is of the same kind as the last,
# add it to the group, otherwise, create a new group
# and put it there.
if (groups.last.letter? and character.letter?) or
(groups.last.digit? and character.digit?)
groups.last << character
else
groups << character
end
end

groups
end

--
Posted via http://www.ruby-....

Phrogz

9/27/2007 5:11:00 PM

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:
> Hi all, I've an interesting problem. Imagine the following string:
>
> 'a1000aa'
>
> I want to break it apart like so:
>
> [ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Phrogz

9/27/2007 5:37:00 PM

On Sep 27, 11:11 am, Phrogz <phr...@mac.com> wrote:
> On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:
>
> > Hi all, I've an interesting problem. Imagine the following string:
>
> > 'a1000aa'
>
> > I want to break it apart like so:
>
> > [ 'a', '1000', 'aa' ]
>
> irb(main):001:0> s = 'a1000aa'
> => "a1000aa"
> irb(main):002:0> s.split( /(\d+)/ )
> => ["a", "1000", "aa"]

Or, if you want multiple types of character groupings:

irb(main):001:0> s = 'hello world, you crazy world!'
=> "hello world, you crazy world!"

irb(main):003:0> s.scan( /[aeiou]+|[b-df-hj-np-tv-z]+|[^a-z]+/ )
=> ["h", "e", "ll", "o", " ", "w", "o", "rld", ", ", "y", "ou", " ",
"cr", "a", "zy", " ", "w", "o", "rld", "!"]

Daniel Waite

9/28/2007 12:06:00 AM

Gavin Kistner wrote:
> irb(main):001:0> s = 'a1000aa'
> => "a1000aa"
> irb(main):002:0> s.split( /(\d+)/ )
> => ["a", "1000", "aa"]

WOW! Freakin' awesome!

One caveat...

irb(main):004:0> '11aa1000aaa'.split(/(\d+)/)
=> ["", "11", "aa", "1000", "aaa"]

For some reason it answers with a blank element, but I'm sure that's an
easy one to solve.

Thanks, Gavin!
--
Posted via http://www.ruby-....

James Gray

9/28/2007 12:09:00 AM

On Sep 27, 2007, at 7:05 PM, Daniel Waite wrote:

> Gavin Kistner wrote:
>> irb(main):001:0> s = 'a1000aa'
>> => "a1000aa"
>> irb(main):002:0> s.split( /(\d+)/ )
>> => ["a", "1000", "aa"]
>
> WOW! Freakin' awesome!
>
> One caveat...
>
> irb(main):004:0> '11aa1000aaa'.split(/(\d+)/)
> => ["", "11", "aa", "1000", "aaa"]
>
> For some reason it answers with a blank element, but I'm sure
> that's an
> easy one to solve.

If you just want digits and non-digits, I suggest:

>> '11aa1000aaa'.scan(/\D+|\d+/)
=> ["11", "aa", "1000", "aaa"]

James Edward Gray II

Daniel Waite

9/28/2007 6:34:00 AM

James Gray wrote:
> If you just want digits and non-digits, I suggest:
>
> >> '11aa1000aaa'.scan(/\D+|\d+/)
> => ["11", "aa", "1000", "aaa"]

I LOVE it! I gotta brush up on my regex skills. Wait, I need to get some
regex skills first. :)

Thanks, Edward; that made my night.

--
Posted via http://www.ruby-....

Lloyd Linklater

9/28/2007 11:09:00 AM

Gavin Kistner wrote:
> On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:
>> Hi all, I've an interesting problem. Imagine the following string:
>>
>> 'a1000aa'
>>
>> I want to break it apart like so:
>>
>> [ 'a', '1000', 'aa' ]
>
> irb(main):001:0> s = 'a1000aa'
> => "a1000aa"
> irb(main):002:0> s.split( /(\d+)/ )
> => ["a", "1000", "aa"]

Gavin, how in the WORLD does this bit of black magic work and how did
you ever figure it out???
--
Posted via http://www.ruby-....

James Gray

9/28/2007 12:24:00 PM

On Sep 28, 2007, at 6:09 AM, Lloyd Linklater wrote:

> Gavin Kistner wrote:
>> On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:
>>> Hi all, I've an interesting problem. Imagine the following string:
>>>
>>> 'a1000aa'
>>>
>>> I want to break it apart like so:
>>>
>>> [ 'a', '1000', 'aa' ]
>>
>> irb(main):001:0> s = 'a1000aa'
>> => "a1000aa"
>> irb(main):002:0> s.split( /(\d+)/ )
>> => ["a", "1000", "aa"]
>
> Gavin,

I'm not Gavin, but...

> how in the WORLD does this bit of black magic work

Captures in a Regexp passed to split() are returned as part of the
result.

> and how did you ever figure it out???

Interestingly, the documentation doesn't seem to mention it. I guess
I knew it was there because Perl works the same way and I tried it
sometime.

James Edward Gray II

Yossef Mendelssohn

9/28/2007 1:07:00 PM

On Sep 28, 7:23 am, James Edward Gray II <ja...@grayproductions.net>
wrote:

> I'm not Gavin, but...

Ditto

> Interestingly, the documentation doesn't seem to mention it. I guess
> I knew it was there because Perl works the same way and I tried it
> sometime.

Ditto

> James Edward Gray II

Not ditto

--
-yossef

comp.lang.ruby

Break apart a string by kind of characters

Daniel Waite

Daniel Waite

Phrogz

Phrogz

Daniel Waite

James Gray

Daniel Waite

Lloyd Linklater

James Gray

Yossef Mendelssohn

x Login to ForumsZone