[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: [QUIZ][SOLUTION] Regexp.build() (#4

Warren Brown

10/18/2004 3:17:00 PM

To me, the real challenge of this quiz was to specify ranges as REs.

I took the challenge of trying to find the smallest RE representation
of a range.

Since the algorithm is complex enough, I decided to ignore certain
issues to keep the algorithm as understandable as possible. For
instance, I decided to not allow leading zeros (although this would be
a simple matter of a "0*" near the front of the RE), not to allow
negative numbers, not to allow underscores in integers, not to allow
other representations of integers (e.g. "0x11", "0o11", "0b11", "?a").
I did decide to keep each parameter in its own capture, so that you
could tell which parameter caused the match.

The algorithm is pretty straight-forward, except for the idea of a
"largest, 'roundest' number in range". Basically, the algorithm works
in two halves. The first half starts at the ones digit and moves up
to higher digits creating bigger and bigger ranges, up to a point. I
named this point "roundest". As an example of this half of the
algorithm, consider how the range 123..999 is expanded into
"12[3-9]|1[3-9]\d|[4-9]\d\d".

After the "roundest" number is reached, the algorithm starts working
back down towards the ones digit creating smaller and smaller ranges.
As an example of this half of the algorithm, consider how the range
100..234 is expanded into "1\d\d|2[0-2]\d|23[0-4]".

There were a surprising number of nooks and crannies to this
algorithm, and I'm not sure I've worked out all of the edge cases, so
I may be posting an update if I find another one.



class Regexp

# Create a namespace for our subroutines.
module Build

# Return the digit at index places from the right for int.
def self.digit(int,index)
(int / 10 ** index) % 10
end

# Build the smallest RE for the digits in the given range.
def self.digit_re(first,last)
case (last - first)
when 0 then first.to_s
when 1 then "[#{first}#{last}]"
when 9 then '\d'
else "[#{first}-#{last}]"
end
end

# Build a RE for first, varying to last at digit index.
def self.int_re(first,last,index)
first.to_s[0...-(index + 1)] +
digit_re(digit(first,index),digit(last,index)) +
'\d' * index
end

# Build a RE for the integers in the given range.
def self.range_re(first,last)
# Find first difference.
res = []
last_len = last.to_s.length
first_diff = 0
last_len.downto(0) do |first_diff|
break if digit(first,first_diff) != digit(last,first_diff)
end
# Find (largest, "roundest" number in range) - 1.
roundest = (last / 10 ** first_diff) * 10 ** first_diff - 1
# Handle everything from first to roundest - 1.
(0...first_diff).each do |index|
next if index < first_diff - 1 &&
digit(roundest,index) - digit(first,index) == 9
res << int_re(first,roundest,index)
first = (first / 10 ** (index + 1) + 1) * 10 ** (index + 1)
break if first > roundest
end
# Handle everything from roundest to last, except last digit.
(first_diff - 1).downto(1) do |index|
next if digit(last,index) == 0
tmp_last = (last / 10 ** index) * 10 ** index - 1
res << int_re(first,tmp_last,index)
first = tmp_last + 1
end
# Last digit is special.
res << int_re(first,last,0)
res.join('|')
end

end

# Build a RE for each argument.
def Regexp.build(*args)
res = []
args.each do |arg|
# If it is an integer, just match it.
if arg.respond_to?(:to_i)
res << arg.to_i
# If it is a range, build the RE.
elsif arg.respond_to?(:exclude_end?)
last = arg.exclude_end? ? arg.last - 1 : arg.last
res << Build::range_re(arg.first,last)
# Otherwise, error
else
$stderr.puts "Unknown argument (#{arg.inspect})."
end
end
# Combine REs.
Regexp.new("\\A(#{res.join(')|(')})\\z")
end
end


# Run some test cases.
p Regexp.build(1..1000000)
p Regexp.build(12345...24680)
p Regexp.build(123..123)
p Regexp.build(10..19)
p Regexp.build(1..102)
p Regexp.build(100..234)
p Regexp.build(1990..2010)
p Regexp.build(1..10,20,30..40,50,60,70..80)


- Warren Brown





1 Answer

James Gray

10/18/2004 6:51:00 PM

0

On Oct 18, 2004, at 10:16 AM, Warren Brown wrote:

> To me, the real challenge of this quiz was to specify ranges as
> REs.
>
> I took the challenge of trying to find the smallest RE representation
> of a range.

This brings up an interesting point. What is the smallest regex to
match 1..1_000_000?

^(1000000|[1-9]\d{0,5})$

That's my best guess.

James Edward Gray II