[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Regular Expressions

Justin To

6/17/2008 4:08:00 PM

Hello! I'm trying this problem that says I must match versions in a CSV
file,

could be anything like:

v.6.0.3-3
aajd4-43_3
ABCD 5.0
ABCDv.5.0
A 3.40
...

With the other fields in mind, I thought "heck, looks like versions are
the only ones that contain a series of letters, digits, periods,
underscores and dashes..."

I'm pretty new to Ruby so I don't have very much experience with regular
expressions. Is it possible to make just one regular expression to
fulfill my problem? I need a regular expression that will return:

v.6.0.3-3: true because there's a v followed by a series of '.' and
digits
aajd4-43_3: true because there's a series of digits, '-', and '_'
ABCD 5.0: true because there's a series of digits and '.'
ABCDv.5.0: true...
A 3.40: true...

Thanks for the help!
--
Posted via http://www.ruby-....

4 Answers

Jesús Gabriel y Galán

6/17/2008 4:48:00 PM

0

On Tue, Jun 17, 2008 at 6:08 PM, Justin To <tekmc@hotmail.com> wrote:
> Hello! I'm trying this problem that says I must match versions in a CSV
> file,
>
> could be anything like:
>
> v.6.0.3-3
> aajd4-43_3
> ABCD 5.0
> ABCDv.5.0
> A 3.40
> ...
>
> With the other fields in mind, I thought "heck, looks like versions are
> the only ones that contain a series of letters, digits, periods,
> underscores and dashes..."
>
> I'm pretty new to Ruby so I don't have very much experience with regular
> expressions. Is it possible to make just one regular expression to
> fulfill my problem? I need a regular expression that will return:
>
> v.6.0.3-3: true because there's a v followed by a series of '.' and
> digits
> aajd4-43_3: true because there's a series of digits, '-', and '_'
> ABCD 5.0: true because there's a series of digits and '.'
> ABCDv.5.0: true...
> A 3.40: true...

I think there's some information missing here: how many of
these characters form a "series"? More than 1? Do they
have to be interleaved in some order, like, you need digits
followed by a . a - or a _ followed by more digits, or it doesn't matter.

The simplest case: two or more of those characters in a row:

irb(main):023:0> versions = ["v.6.0.3-3", "aajd4-43_3","ABCD 5.0",
"ABCDv.5.0", "A 3.40"]
=> ["v.6.0.3-3", "aajd4-43_3", "ABCD 5.0", "ABCDv.5.0", "A 3.40"]
irb(main):024:0> r = /[.-_1-9]{2,}/
=> /[.-_1-9]{2,}/
irb(main):025:0> versions.each {|x| puts "#{x}: #{(x =~ r) != nil}"}
v.6.0.3-3: true
aajd4-43_3: true
ABCD 5.0: true
ABCDv.5.0: true
A 3.40: true

1 or more digits, followed by . or _ or -, followed by one or more digits:

irb(main):030:0> r = /\d+[-._]\d+/
=> /\d+[-._]\d+/
irb(main):031:0> versions.each {|x| puts "#{x}: #{(x =~ r) != nil}"}
v.6.0.3-3: true
aajd4-43_3: true
ABCD 5.0: true
ABCDv.5.0: true
A 3.40: true

You will have to refine your requirements a little bit, in order to choose among
these (and any variations on this).

Jesus.

Robert Klemme

6/17/2008 4:48:00 PM

0

On 17.06.2008 18:08, Justin To wrote:
> Hello! I'm trying this problem that says I must match versions in a CSV
> file,
>
> could be anything like:
>
> v.6.0.3-3
> aajd4-43_3
> ABCD 5.0
> ABCDv.5.0
> A 3.40
> ..
>
> With the other fields in mind, I thought "heck, looks like versions are
> the only ones that contain a series of letters, digits, periods,
> underscores and dashes..."
>
> I'm pretty new to Ruby so I don't have very much experience with regular
> expressions. Is it possible to make just one regular expression to
> fulfill my problem? I need a regular expression that will return:
>
> v.6.0.3-3: true because there's a v followed by a series of '.' and
> digits
> aajd4-43_3: true because there's a series of digits, '-', and '_'
> ABCD 5.0: true because there's a series of digits and '.'
> ABCDv.5.0: true...
> A 3.40: true...
>
> Thanks for the help!

Yes, that's easy, just /./ as an expression.

Seriously, it is similarly crucial what it does *not* match.

The easiest (but not most efficient approach) would be to create on
alternative for each variant you have, like

%r{
^(?:
v(?:\.\d+)+-\d+
| \w+\d+-[\d_]+
| ...
)$
}x

etc.

But given the number of alternatives you present it might be difficult
to avoid also matching other stuff. At least, you'll face a pretty
complex regular expression.

Kind regards

robert

Dave Bass

6/18/2008 11:41:00 AM

0

Robert Klemme wrote:
> The easiest (but not most efficient approach) would be to create on
> alternative for each variant you have, like
>
> %r{
> ^(?:
> v(?:\.\d+)+-\d+
> | \w+\d+-[\d_]+
> | ...
> )$
> }x
>
> etc.

The problem with regular expressions is that they can easily get out of
hand and become incomprehensible, as the above code shows (though
presumably to RK it's totally transparent).

Better to write a number of small regexps, each testing for a specific
pattern. Then combine the results with a logical OR. This can be done
using a flag variable, or an if-elsif tree, a case statement, etc.,
whatever you feel happiest with. This approach will be a lot easier to
test and debug.
--
Posted via http://www.ruby-....

Robert Klemme

6/18/2008 12:21:00 PM

0

2008/6/18 Dave Bass <davebass@musician.org>:
> Robert Klemme wrote:
>> The easiest (but not most efficient approach) would be to create on
>> alternative for each variant you have, like
>>
>> %r{
>> ^(?:
>> v(?:\.\d+)+-\d+
>> | \w+\d+-[\d_]+
>> | ...
>> )$
>> }x
>>
>> etc.
>
> The problem with regular expressions is that they can easily get out of
> hand and become incomprehensible, as the above code shows (though
> presumably to RK it's totally transparent).

Actually the RX I presented was not complete and was intended to
convey your point. :-)

> Better to write a number of small regexps, each testing for a specific
> pattern. Then combine the results with a logical OR. This can be done
> using a flag variable, or an if-elsif tree, a case statement, etc.,
> whatever you feel happiest with. This approach will be a lot easier to
> test and debug.

Depends. If you build the one RX one alternative at a time and test
during each iteration I'd say that works pretty good as well. And if
the volume of data is hight the performance advantage of a single RX
might pay off.

Kind regards

robert


--
use.inject do |as, often| as.you_can - without end