[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

How would you design regexps in the integer domain?

Andreas Launila

5/5/2008 8:25:00 AM

I'm trying to come up with a clean way to specify regexps in the integer
domain. I.e. instead of describing a pattern of characters (as in normal
regexps) one describes patterns of integers ("17 followed by 3 or 15"
rather than "'a' followed by 'b' or 'c'").

My main objective is to make the syntax intuitive to Ruby users. I have
been toying around with a few different approaches, but I'm not sure if
any meet the goal. How would you design the syntax for regexps in the
integer domain?

The syntax does not need to support especially many operators:
* Kleene star ('*' in character regexps)
* "At least once" ('+' in character regexps)
* Alternation ('|' in character regexps, i.e. "a|b" being "'a' or 'b'")

Below are some of the approaches one could take.


== String representation

Many Ruby users have used character regexps defined as strings, so it
would seem like a good idea to make integer regexps as similar as
possible. A delimiter would be needed though, since "17" could otherwise
mean "1 followed by 7" or just "17". One delimiter could for instance be
a blank space. The following is how the pattern "Any number of (17 or, 1
followed by 5) followed by 4711" could look.

IntRegexp.new('(17|1 5)*4711')


== Method combination

In this approach the regexp is gradually built up by invoking methods.
One major question is what the methods should be named.

The following is how the above example could look.

first_part = IntRegexp.new(17).or(IntRegexp.new(1).followed_by 5)
first_part.any_number_of_times.followed_by 4711

Or with some different method names

first_part = IntRegexp.new(17) | (IntRegexp.new(1) + 5)
first_part.any_times + 4711

I'm unsure what sort of method names would strike a good balance between
verbosity and readability.


== Blocks

TextualRegexp[1] provides a block-based interface for specifying normal
regexps. Perhaps something similar could be done for integer regexps?
The following is the syntax of TextualRegexp but with integers.

regexp do
at_least_zero do
any :of do
integer 17
group{ integer 1; integer 5 }
end
end
integer 4711
end

There's a fair amount of redundancy since there will never be any thing
other than "integer" specified.


[1] http://rubyforge.org/projec...

--
Andreas Launila


12 Answers

Robert Klemme

5/5/2008 8:45:00 AM

0

2008/5/5 Andreas Launila <ruby-talk@lokorin.org>:
> I'm trying to come up with a clean way to specify regexps in the integer
> domain. I.e. instead of describing a pattern of characters (as in normal
> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> rather than "'a' followed by 'b' or 'c'").

Why do you need this, i.e. what advantages do you expect for a
particular "integer regexp" over classic regular expressions?

> My main objective is to make the syntax intuitive to Ruby users. I have
> been toying around with a few different approaches, but I'm not sure if
> any meet the goal. How would you design the syntax for regexps in the
> integer domain?
>
> The syntax does not need to support especially many operators:
> * Kleene star ('*' in character regexps)
> * "At least once" ('+' in character regexps)
> * Alternation ('|' in character regexps, i.e. "a|b" being "'a' or 'b'")
>
> Below are some of the approaches one could take.
>
>
> == String representation
>
> Many Ruby users have used character regexps defined as strings, so it
> would seem like a good idea to make integer regexps as similar as
> possible. A delimiter would be needed though, since "17" could otherwise
> mean "1 followed by 7" or just "17". One delimiter could for instance be
> a blank space. The following is how the pattern "Any number of (17 or, 1
> followed by 5) followed by 4711" could look.
>
> IntRegexp.new('(17|1 5)*4711')

You can as well do /(17|1 5)*4711/, can't you?

> == Method combination
>
> In this approach the regexp is gradually built up by invoking methods.
> One major question is what the methods should be named.
>
> The following is how the above example could look.
>
> first_part = IntRegexp.new(17).or(IntRegexp.new(1).followed_by 5)
> first_part.any_number_of_times.followed_by 4711

This looks ugly.

> Or with some different method names
>
> first_part = IntRegexp.new(17) | (IntRegexp.new(1) + 5)
> first_part.any_times + 4711
>
> I'm unsure what sort of method names would strike a good balance between
> verbosity and readability.
>
>
> == Blocks
>
> TextualRegexp[1] provides a block-based interface for specifying normal
> regexps. Perhaps something similar could be done for integer regexps?

The change would be easy, i.e. you just need to allow anything where
so far only raw strings were allowed and internally convert it via
#to_s.

> The following is the syntax of TextualRegexp but with integers.
>
> regexp do
> at_least_zero do
> any :of do
> integer 17
> group{ integer 1; integer 5 }
> end
> end
> integer 4711
> end
>
> There's a fair amount of redundancy since there will never be any thing
> other than "integer" specified.
>
>
> [1] http://rubyforge.org/projec...

Apparently the project has gone and I could not prevent it since Ari
did not give me access to it. However, I posted an early version
here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

Andreas Launila

5/5/2008 9:52:00 AM

0

Robert Klemme wrote:
> 2008/5/5 Andreas Launila <ruby-talk@lokorin.org>:
>> I'm trying to come up with a clean way to specify regexps in the integer
>> domain. I.e. instead of describing a pattern of characters (as in normal
>> regexps) one describes patterns of integers ("17 followed by 3 or 15"
>> rather than "'a' followed by 'b' or 'c'").
>
> Why do you need this, i.e. what advantages do you expect for a
> particular "integer regexp" over classic regular expressions?
>

The two types of regexps are not really comparable since they work in
different domains. I.e. classic regular expressions describe patterns in
arrays of characters rather than in arrays of integers. One could for
instance use an integer regexp to decide whether an array begins with 17
and ends with 8.

Specifically this is for specifying deterministic finite automatons,
with an integer alphabet, for an upcoming constraint in Gecode/R[1].

>>
>> IntRegexp.new('(17|1 5)*4711')
>
> You can as well do /(17|1 5)*4711/, can't you?
>

True, assuming that one then convert it back to its source to reparse.
It would probably make it even more familiar.


[1] http://gecoder.ruby...

--
Andreas Launila


Eivind Eklund

5/5/2008 10:55:00 AM

0

On Mon, May 5, 2008 at 11:51 AM, Andreas Launila <ruby-talk@lokorin.org> wrote:
> Robert Klemme wrote:
> > 2008/5/5 Andreas Launila <ruby-talk@lokorin.org>:
> >> I'm trying to come up with a clean way to specify regexps in the integer
> >> domain. I.e. instead of describing a pattern of characters (as in normal
> >> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> >> rather than "'a' followed by 'b' or 'c'").
> >
> > Why do you need this, i.e. what advantages do you expect for a
> > particular "integer regexp" over classic regular expressions?
> >
>
> The two types of regexps are not really comparable since they work in
> different domains. I.e. classic regular expressions describe patterns in
> arrays of characters rather than in arrays of integers. One could for
> instance use an integer regexp to decide whether an array begins with 17
> and ends with 8.
>
> Specifically this is for specifying deterministic finite automatons,
> with an integer alphabet, for an upcoming constraint in Gecode/R[1].

I'd implement this using a generic repeat operator instead of
hardcoding * and +, and using a generic comparator instead of
restricting to Integers.

The declaration for this would be something like

# Use repeat(0, nil, ...) for *
# Use repeat(1, nil, ...) for +
# Use repeat(0, 1, ...) for ?
def repeat(minimum, maximum, repeated_expression)
def or(*expressions)

So you'd convert your example "(17|1 5)*" to repeat(0, nil, or(17, [1, 5]))

Actually, as I thought of after I had written the above, I have
written code that does part of this already, and is generic - it's
available in types.rb (http://people.freebsd.org/~eivind/r...).
If you want it, I think I can extend that to support repeats tonight.
Feel free to use code from it under any license that absolves me of
legal blame (ie, the only reason I don't put it in the public domain
is because that expose me to legal liability.)

Eivind.

Robert Klemme

5/5/2008 10:57:00 AM

0

2008/5/5 Andreas Launila <ruby-talk@lokorin.org>:
> Robert Klemme wrote:
> > 2008/5/5 Andreas Launila <ruby-talk@lokorin.org>:
> >> I'm trying to come up with a clean way to specify regexps in the integer
> >> domain. I.e. instead of describing a pattern of characters (as in normal
> >> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> >> rather than "'a' followed by 'b' or 'c'").
> >
> > Why do you need this, i.e. what advantages do you expect for a
> > particular "integer regexp" over classic regular expressions?
>
> The two types of regexps are not really comparable since they work in
> different domains. I.e. classic regular expressions describe patterns in
> arrays of characters rather than in arrays of integers. One could for
> instance use an integer regexp to decide whether an array begins with 17
> and ends with 8.

Ah! That bit of information was missing from the original posting.

> Specifically this is for specifying deterministic finite automatons,
> with an integer alphabet, for an upcoming constraint in Gecode/R[1].

I see. Then of course it's a different story. Unless you translate
the array of integers into a string and apply a textual regexp. But
you can still do that internally, i.e. shield that away as an
implementation detail.

I'd probably choose the approach we had taken with texrex because it
is easy to implement and has good usability (albeit it's a bit
verbose). I would base the decision ultimately on the users of such a
package, i.e. if they are familiar with regular expressions then the
string based approach might be the better choice (short and concise)
while for others the wordy approach might be better.

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

Andreas Launila

5/5/2008 2:20:00 PM

0

Eivind Eklund wrote:
>
> I'd implement this using a generic repeat operator instead of
> hardcoding * and +, and using a generic comparator instead of
> restricting to Integers.
>
> The declaration for this would be something like
>
> # Use repeat(0, nil, ...) for *
> # Use repeat(1, nil, ...) for +
> # Use repeat(0, 1, ...) for ?
> def repeat(minimum, maximum, repeated_expression)
> def or(*expressions)
>
> So you'd convert your example "(17|1 5)*" to repeat(0, nil, or(17, [1, 5]))
>

I like that idea, especially using the arrays to group integers. It
seems like a rather economic way to do it. I think "any" would fit
better than "or" though (since it can take more than two arguments). The
example in its entirety would then be

[repeat(0, nil, any(17, [1, 5])), 4711]

I would also consider swapping the position of repeated_expression to
the first argument of repeat, since it seems like a more natural order
to me ("repeat expression between 0 and 4 times"). I'm a bit split on
using nil for infinity, but that is minor.

I will need to explore the idea, but it feels like a good fit to me.
Thank you.

> Actually, as I thought of after I had written the above, I have
> written code that does part of this already, and is generic - it's
> available in types.rb (http://people.freebsd.org/~eivind/r...).
> If you want it, I think I can extend that to support repeats tonight.
> Feel free to use code from it under any license that absolves me of
> legal blame (ie, the only reason I don't put it in the public domain
> is because that expose me to legal liability.)
>

Thanks for the offer, but there's no need to add additional support. The
important part is the idea, which you have provided.

--
Andreas Launila

Phrogz

5/5/2008 3:11:00 PM

0

On May 5, 2:24 am, Andreas Launila <ruby-t...@lokorin.org> wrote:
> I'm trying to come up with a clean way to specify regexps in the integer
> domain. I.e. instead of describing a pattern of characters (as in normal
> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> rather than "'a' followed by 'b' or 'c'").

You could specify it as a PEG[1], possibly using Treetop[2] in some
way.

The benefit of the PEG (re-usable sub-expressions) may not outweigh
the multi-line nature. However, you could do something like LPEG[3]
does. Instead of using standard PEG syntax come up with a custom
syntax via operators that allows you to describe an entire grammar in
a single Ruby expression. I'm personally not a fan of the syntax LPEG
uses; you could come up with a better one, perhaps using terse method
names instead of operators in some cases. (Perhaps
"pattern.atleast(5)" instead of LPEG's "pattern^5".)

[1] http://en.wikipedia.org/wiki/Parsing_expressi...
[2] http://treetop.ruby...
[3] http://www.inf.puc-rio.br/~roberto...

Phrogz

5/5/2008 3:46:00 PM

0

On May 5, 9:10 am, Phrogz <phr...@mac.com> wrote:
> The benefit of the PEG (re-usable sub-expressions) may not outweigh
> the multi-line nature.

To expand on this a little bit more, and use an LPEG-like custom
grammar entirely in Ruby, using PEG instead of regexp would allow you
to do something like:

header = IPEG[17,34,15,12,89,127]
break = IPEG[13,10]
close = IPEG[0]

body = (IPEG[12,43,13,break,57,75] | IPEG[18,13,31])
message_style_1 = header + body + close
message_style_2 = header + IPEG[108,103] + break + body + close
message = message_style_1 | message_style_2

result = message.match( my_array_of_numbers )

As you can see, this allows you to break up complex expressions into
named parts. This makes it both easier to understand, and DRYs up the
expression.

In the above made up syntax:
* "IPEG" stands for "Integer Parsing Expression Grammar"
* The IPEG.[] notation might represent a literal sequence of
integers
* The + operator would combine sequences
* The | operator provides alternation


ara.t.howard

5/5/2008 4:48:00 PM

0


On May 5, 2008, at 2:24 AM, Andreas Launila wrote:
> I'm trying to come up with a clean way to specify regexps in the
> integer
> domain. I.e. instead of describing a pattern of characters (as in
> normal
> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> rather than "'a' followed by 'b' or 'c'").
>
> My main objective is to make the syntax intuitive to Ruby users. I
> have
> been toying around with a few different approaches, but I'm not sure
> if
> any meet the goal. How would you design the syntax for regexps in the
> integer domain?
>
> The syntax does not need to support especially many operators:
> * Kleene star ('*' in character regexps)
> * "At least once" ('+' in character regexps)
> * Alternation ('|' in character regexps, i.e. "a|b" being "'a' or
> 'b'")
>
> Below are some of the approaches one could take.


i personally would not bother making a new syntax, rather i'd simply
use the exiting one and add a new atom builder:



cfp:~ > cat a.rb
N = 0.chr

def N n
"#{ n }#{ N }"
end

module Enumerable
def nmatch pattern
match = pattern.match( select{|i| Fixnum === i}.join(N) )
if match
returned = []
captured = match.to_a.map{|capture| capture.split N}
returned << captured.first.map{|s| Float(s).to_i}
captured[1..-1].each do |list|
returned << Float(list.first).to_i
end
returned
else
nil
end
end
end

pattern = %r/#{ N 42 }#{ N 7 }(#{ N 17 }){2}(#{ N 1 }){1,}/

p [42, 7, 17, 17, 1, 1, 1].nmatch( pattern ).to_a



cfp:~ > ruby a.rb
[[42, 7, 17, 17, 1, 1], 17, 1]




a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama




unbewusst.sein

5/5/2008 5:04:00 PM

0

ara.t.howard <ara.t.howard@gmail.com> wrote:

> we can deny everything, except that we have the possibility of being
> better. simply reflect on that.

which suppose their is a norm somewhere, ie. something able to measure
what is worst what is best...

--
Une Bévue

ara.t.howard

5/5/2008 5:12:00 PM

0


On May 5, 2008, at 11:05 AM, Une B=E9vue wrote:
> which suppose their is a norm somewhere, ie. something able to measure
> what is worst what is best...

i don't think the implication is an absolute one. whether relative or =20=

absolute i think it would be rare the to find a being that did not =20
know this about themselves at some, possibly deeply hidden, level.

a @ http://codeforp...
--
we can deny everything, except that we have the possibility of being =20
better. simply reflect on that.
h.h. the 14th dalai lama