[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: [QUIZ] Longest Repeated Substring (#153

James Koppel

1/21/2008 10:37:00 PM

My solution finds substrings of length n, groups the identical ones together in a hash of their starting indices, prunes out cases where two of the "repeated" substring overlap each other, and then looks in the same places for substrings of length n+1.

The aggressive pruning does, however, cause it to fail in the case where a substring that overlaps at one length does not overlap at a higher length. For example, take the string "aaaaaa". The substring "aa" exists at indicies [0,1,2,3,4]. My program will prune that down to [0,2,4]. It will then record the substring of length 3 that starts at each of [0,2,4], prune out the ones that start at 2 (overlaps with the one startign at 0) and 4 (goes out of bounds), leaving only an unrepeated substring, and would thus fall back and print "aa," even though the correct answer is "aaa," because the index of 3 had already been pruned.

On the other hand, the aggressive pruning does make my program somewhat fast. It takes about 45s on Also Sprach Zarathustra.

text = ARGF.read

substr_locs = {}

text.split(//).each_with_index do |c,idx|
if substr_locs[c]
substr_locs[c] << idx
else
substr_locs[c] = [idx]
end
end

substr_locs.each_pair do |substr,locs|
substr_locs.delete(substr) if locs.length == 1
end

len = 1
done = false
until done
len += 1
new_substr_locs = {}
substr_locs.each_pair do |substr,locs|
locs.each do |idx|
s = text[idx,len]
next if idx+len>text.length
if new_substr_locs[s]
new_substr_locs[s] << idx
else
new_substr_locs[s] = [idx]
end
end
end
##Must reduce for the case where a substring overlaps itself
new_substr_locs.each_key do |s|
locs = new_substr_locs[s]
idx = 1
while idx < locs.length
if locs[idx]-locs[idx-1] < len
locs.delete_at(idx)
else
idx += 1
end
end
new_substr_locs.delete(s) if locs.length == 1
end
unless new_substr_locs.keys.empty?
substr_locs = new_substr_locs
else
done = true
end
end
puts substr_locs.keys.first



____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?categor...


5 Answers

Raffa

1/23/2008 5:32:00 PM

0

cannot use regex, i suppose:

(.{aNumber}).*\1



where aNumber = 2..x (x=text.length/2), until regex 'response' is
'nil'



James Gray

1/23/2008 6:09:00 PM

0

On Jan 23, 2008, at 11:35 AM, Raffa wrote:

> cannot use regex, i suppose:
>
> (.{aNumber}).*\1
>
>
>
> where aNumber = 2..x (x=text.length/2), until regex 'response' is
> 'nil'

You can use whatever you like.

Try it out on War & Peace though and see how it does. :)

James Edward Gray II


Sergey Volkov

1/23/2008 6:22:00 PM

0

Why not?
The simplest solution could be:

def longest_repeated_substring str
(str.size/2).downto(1) { |i|
/(.{#{i}}).*\1/m =~ str and return $1
}
nil
end

but it's too slow;

On Jan 23, 2008 12:35 PM, Raffa <raffaboss@gmail.com> wrote:
> cannot use regex, i suppose:
>
> (.{aNumber}).*\1
>
>
>
> where aNumber = 2..x (x=text.length/2), until regex 'response' is
> 'nil'

Ken Bloom

1/24/2008 5:38:00 AM

0

On Wed, 23 Jan 2008 13:21:31 -0500, SERGEY VOLKOV wrote:

> Why not?
> The simplest solution could be:
>
> def longest_repeated_substring str
> (str.size/2).downto(1) { |i|
> /(.{#{i}}).*\1/m =~ str and return $1
> }
> nil
> end
>
> but it's too slow;
>
> On Jan 23, 2008 12:35 PM, Raffa <raffaboss@gmail.com> wrote:
>> cannot use regex, i suppose:
>>
>> (.{aNumber}).*\1
>>
>>
>>
>> where aNumber = 2..x (x=text.length/2), until regex 'response' is 'nil'

Well, one would think that the absolute simplest solution is /(.*+).*\1/,
but that's what I was testing when I invented the "your banana my banana"
test case. (Which it failed, returning only "y" as the repeated
substring).

Yours is a nice extension of this basic idea.

--Ken

--
Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...

Sergey Volkov

1/24/2008 12:17:00 PM

0

It's linear search for repeated substring length,
I've tried binary search, too slow too

On Jan 24, 2008 12:39 AM, Ken Bloom <kbloom@gmail.com> wrote:
>
> On Wed, 23 Jan 2008 13:21:31 -0500, SERGEY VOLKOV wrote:
>
> > Why not?
> > The simplest solution could be:
> >
> > def longest_repeated_substring str
> > (str.size/2).downto(1) { |i|
> > /(.{#{i}}).*\1/m =~ str and return $1
> > }
> > nil
> > end
> >
> > but it's too slow;
> >
> > On Jan 23, 2008 12:35 PM, Raffa <raffaboss@gmail.com> wrote:
> >> cannot use regex, i suppose:
> >>
> >> (.{aNumber}).*\1
> >>
> >>
> >>
> >> where aNumber = 2..x (x=text.length/2), until regex 'response' is 'nil'
>
> Well, one would think that the absolute simplest solution is /(.*+).*\1/,
> but that's what I was testing when I invented the "your banana my banana"
> test case. (Which it failed, returning only "y" as the repeated
> substring).
>
> Yours is a nice extension of this basic idea.
>
>
> --Ken
>
> --
> Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
> Department of Computer Science. Illinois Institute of Technology.
> http://www.iit.edu...
>
>