[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

How about Enumerable#find_pattern?

A. S. Bradbury

9/10/2006 11:52:00 AM

Ignore the name, I don't really know what it's best to call it. Basically I've
been thinking it might be useful if Enumerable had a method to efficiently
search for a pattern in an Enumerable object. Existing methods like #find and
#grep work only on matching a single object.

This method might be implemented using the Knuth-Morris-Pratt algorithm
(http://en.wikipedia.org/wiki/Knuth-Morris-Pratt...). It just seems to
me that it's a useful feature for Enumerable objects to have. A brute force
search is easy using #each_cons, and of course users can write their own
KMP-based searches, it just strikes me as something that has general utility.
There'd need to be one to return all matches and another to return only the
first. The method should return the position of the start of the match.
Comments?

Alex

10 Answers

dblack

9/10/2006 12:03:00 PM

0

A. S. Bradbury

9/10/2006 12:11:00 PM

0

On Sunday 10 September 2006 13:03, dblack@wobblini.net wrote:
> #find, yes, but #grep does something similar to what I understand you
> to be describing:
>
> %w{a b c}.grep(/./)
> => ["a", "b", "c"]
> irb(main):002:0> str = "a\nb\nc"
> => "a\nb\nc"
> irb(main):003:0> str.grep(/./)
> => ["a\n", "b\n", "c"]
>

I'm thinking this:
foo=[1,2,3,4,5,6,5,4,3,2,1]
foo.find_pattern(5,6,5) #=> 4 (position of the start of this pattern)

Alex

A. S. Bradbury

9/10/2006 2:14:00 PM

0

On Sunday 10 September 2006 15:04, Robert Dober wrote:
> class Array
> def pattern( *args )
> l = args.length
> c = dup
> while c.first(l) != c do
> c.shift
> return nil if c.length < l
> end
> length - c.length
> end
> end

That is one possible implementation, and this isn't something I'm having a
problem implementing. I'm just asking whether it would be appropriate for
there to be a pattern matcher for Enumerable objects in core implemented
using an efficient algorithm such as Knuth-Morris-Pratt. To me it just seems
useful to be able to efficiently search Enumerable objects without having to
implement it myself. It's a very small addition that just makes Enumerable
objects that much more useful.

Alex

A. S. Bradbury

9/10/2006 2:42:00 PM

0

On Sunday 10 September 2006 15:04, Robert Dober wrote:
> class Array
>     def pattern( *args )
>         l = args.length
>         c = dup
>         while c.first(l) != c do
>             c.shift
>             return nil if c.length < l
>         end
>         length - c.length
>     end
> end

'while c.first(l) != c do' should of course be:
while c.first(l) != args do

Alex

A. S. Bradbury

9/12/2006 12:30:00 PM

0

Well, here's a (not very good) implementation, kind of ported from some python
code.

module Enumerable

def find_pattern(*pat)
match_length=0
match_pos=0
shifts=compute_table(pat)
self.each do |obj|
while (match_length >=0) and !(pat[match_length]===obj)
match_pos+=shifts[match_length]
match_length-=shifts[match_length]
end
match_length+=1
if match_length == pat.length
return match_pos
end
end
return nil # failed match
end

private
def compute_table(pat)
shifts=Array.new(pat.size)
shift = 1
0.upto pat.size do |i|
a=pat[i-1]
b=pat[i-shift-1]
while (shift < i) and (pat[i-1] != pat[i-shift-1])
shift += shifts[i-shift-1]
end
shifts[i]=shift
end
return shifts
end
end

a="aaaabbaabbab".split(//)
a.find_pattern 'a', 'b', 'b' #=> 3
a.find_pattern 'c' #=> nil
a.find_pattern /a|b/, 'a', 'b' #=> 2

The idea is to allow efficient searching for patterns in the elements of any
Enumerable object, the example above is contrived.

Alex

Rick DeNatale

9/12/2006 6:50:00 PM

0

On 9/12/06, A. S. Bradbury <asbradbury@tekcentral.org> wrote:
> Well, here's a (not very good) implementation, kind of ported from some python
> code.
>
> module Enumerable
>
> def find_pattern(*pat)
> match_length=0
> match_pos=0
> shifts=compute_table(pat)
> self.each do |obj|
> while (match_length >=0) and !(pat[match_length]===obj)
> match_pos+=shifts[match_length]
> match_length-=shifts[match_length]
> end
> match_length+=1
> if match_length == pat.length
> return match_pos
> end
> end
> return nil # failed match
> end
>
> private
> def compute_table(pat)
> shifts=Array.new(pat.size)
> shift = 1
> 0.upto pat.size do |i|
> a=pat[i-1]
> b=pat[i-shift-1]
> while (shift < i) and (pat[i-1] != pat[i-shift-1])
> shift += shifts[i-shift-1]
> end
> shifts[i]=shift
> end
> return shifts
> end
> end
>
> a="aaaabbaabbab".split(//)
> a.find_pattern 'a', 'b', 'b' #=> 3
> a.find_pattern 'c' #=> nil
> a.find_pattern /a|b/, 'a', 'b' #=> 2
>
> The idea is to allow efficient searching for patterns in the elements of any
> Enumerable object, the example above is contrived.

Nice, but it can fall down if the pattern contains a regexp which
matches more than one element:

a = "aaaabbabbab".split(//)
a.find_pattern /aab/, 'b' #=> nil

There's probably a way to fix that, but I'm not sure I can see how.

I did a pretty slavish translation to ruby of the KMP algorithm as
given in the Wikipedia article. It involved turning the enumerable
into either an array or a string so that it could be indexed instead
of using each. That didn't allow regexps at all though.

I've been working on hacking that to work with regexps, but that would
really only work if the enumerable was a string anyway, so it would
probably be better to do a specialized implementation in String.


--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denh...

IPMS/USA Region 12 Coordinator
http://ipmsr12.denh...

Visit the Project Mercury Wiki Site
http://www.mercuryspace...

A. S. Bradbury

9/12/2006 7:14:00 PM

0

On Tuesday 12 September 2006 19:50, Rick DeNatale wrote:
> Nice, but it can fall down if the pattern contains a regexp which
> matches more than one element:
>
> a = "aaaabbabbab".split(//)
> a.find_pattern /aab/, 'b' #=> nil
>
> There's probably a way to fix that, but I'm not sure I can see how.

I'd actually argue that's expected behaviour and you're just using it
wrong(!). /aab/ simply doesn't match any element, so it *should* fail. The
main problem with my current implementation is it doesn't really skip ahead
when it should do, it just continues to the next iteration. This is where
it's easier when you're using an index and a while loop (and the object
responds_to []), but we can't assume that for an Enumerable-compatible
implementation.

The Enumerable mixin provides #to_a, but relying on this seems like a poor
solution...

> I did a pretty slavish translation to ruby of the KMP algorithm as
> given in the Wikipedia article. It involved turning the enumerable
> into either an array or a string so that it could be indexed instead
> of using each. That didn't allow regexps at all though.

> I've been working on hacking that to work with regexps, but that would
> really only work if the enumerable was a string anyway, so it would
> probably be better to do a specialized implementation in String.

The string approach really doesn't seem like the right way to go about this,
if I understand you correctly. Not for the problem I'm trying to solve at
least. The idea of #find_pattern is that it will work for any arbitrary
object.

Perhaps you should be able to specify a comparison function in a block upon
calling find_pattern.

Alex

Alex

Rick DeNatale

9/13/2006 8:03:00 PM

0

On 9/12/06, A. S. Bradbury <asbradbury@tekcentral.org> wrote:
> On Tuesday 12 September 2006 19:50, Rick DeNatale wrote:
> > Nice, but it can fall down if the pattern contains a regexp which
> > matches more than one element:
> >
> > a = "aaaabbabbab".split(//)
> > a.find_pattern /aab/, 'b' #=> nil
> >
> > There's probably a way to fix that, but I'm not sure I can see how.
>
> I'd actually argue that's expected behaviour and you're just using it
> wrong(!). /aab/ simply doesn't match any element, so it *should* fail.

Well, I'd expect that if you allow regexps that you should allow
regexps! But see below.

> The
> main problem with my current implementation is it doesn't really skip ahead
> when it should do, it just continues to the next iteration. This is where
> it's easier when you're using an index and a while loop (and the object
> responds_to []), but we can't assume that for an Enumerable-compatible
> implementation.
>
> The Enumerable mixin provides #to_a, but relying on this seems like a poor
> solution...

Not sure why.

> > I did a pretty slavish translation to ruby of the KMP algorithm as
> > given in the Wikipedia article. It involved turning the enumerable
> > into either an array or a string so that it could be indexed instead
> > of using each. That didn't allow regexps at all though.
>
> > I've been working on hacking that to work with regexps, but that would
> > really only work if the enumerable was a string anyway, so it would
> > probably be better to do a specialized implementation in String.
>
> The string approach really doesn't seem like the right way to go about this,
> if I understand you correctly. Not for the problem I'm trying to solve at
> least. The idea of #find_pattern is that it will work for any arbitrary
> object.

But then regexps won't work except for a string.


Also, strings are a funny kind of enumerable here, since by default,
they just yield themselves in each, unless they include newlines. I'd
think that the general use of find_pattern in a string would be to
search for the pattern in the string, not as a pattern of the lines in
the string.


> Perhaps you should be able to specify a comparison function in a block upon
> calling find_pattern.

Of course this would require careful interface design to expand the
generality while preserving the efficiency.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denh...

A. S. Bradbury

9/13/2006 8:19:00 PM

0

On Wednesday 13 September 2006 21:03, Rick DeNatale wrote:
> On 9/12/06, A. S. Bradbury <asbradbury@tekcentral.org> wrote:
> > On Tuesday 12 September 2006 19:50, Rick DeNatale wrote:
> > > Nice, but it can fall down if the pattern contains a regexp which
> > > matches more than one element:
> > >
> > > a = "aaaabbabbab".split(//)
> > > a.find_pattern /aab/, 'b' #=> nil
> > >
> > > There's probably a way to fix that, but I'm not sure I can see how.
> >
> > I'd actually argue that's expected behaviour and you're just using it
> > wrong(!). /aab/ simply doesn't match any element, so it *should* fail.
>
> Well, I'd expect that if you allow regexps that you should allow
> regexps! But see below.

But why would it be expected behaviour for a regexp to try to match across
multiple items? The regexp is applied to a single item of the Enumerable
object, surely that's the only sensible way to deal with this?.

%w[This is an example].find_pattern /an example/ #=> false.
Surely you wouldn't expect the above snippet to automagically match?

> > The Enumerable mixin provides #to_a, but relying on this seems like a
> > poor solution...
>
> Not sure why.

I'm just thinking that then you have to build the array in memory. It's never
going to be a big deal unless you have *really large* arrays. It just seems
cleaner to seek a solution that examines a single member of the Enumerable
object at a time.

> > The string approach really doesn't seem like the right way to go about
> > this, if I understand you correctly. Not for the problem I'm trying to
> > solve at least. The idea of #find_pattern is that it will work for any
> > arbitrary object.
>
> But then regexps won't work except for a string.

Well no, #find_pattern isn't meant to be tied to regexp matching. You just
could use it for that. I'm not sure I'm really following you that well on
this point.

> Also, strings are a funny kind of enumerable here, since by default,
> they just yield themselves in each, unless they include newlines. I'd
> think that the general use of find_pattern in a string would be to
> search for the pattern in the string, not as a pattern of the lines in
> the string.

Well, you'd use it in a way that is meaningful for your purposes. I'm
basically using this to find patterns in an array of tokens. If my
tokenization rules were simple, perhaps a regexp could be compiled, but it's
really not possible in this situation.

Alex

Matt McQueen

11/21/2008 2:57:00 PM

0

OK, did some error trapping stuff to solve the issue.

Thanks for your help.

Matt