skye.shaw
7/24/2007 4:59:00 AM
On Jul 23, 9:10 pm, James Edward Gray II <ja...@grayproductions.net>
wrote:
> On Jul 23, 2007, at 3:54 PM, bwv549 wrote:
>
> > I have a long string and I need to know the starting position of all
> > substrings matching a particular sequence. Most importantly, it needs
> > to be fast. Secondly, it would be nice if it was somewhat concise.
<snip solutions>
> Another idea is to use String#index() with the optional second
> minimum index parameter which you keep incrementing.
>
> James Edward Gray II
Indeed.
d-o-ibook-g4:~ d-o$ uname -a
Darwin d-o-ibook-g4.local 8.8.0 Darwin Kernel Version 8.8.0: Fri Sep
8 17:18:57 PDT 2006; root:xnu-792.12.6.obj~1/RELEASE_PPC Power
Macintosh powerp
d-o-ibook-g4:~ d-o$ cat bs.rb
require 'benchmark'
require 'enumerator'
COUNT=1_000_000
class String
def scan_p ss
a = []
self.scan(ss){a << Regexp.last_match.begin(0)}
a
end
end
substr = Proc.new do |s,seq|
pos=0
index=[]
while((i=s.index(seq,pos))!=nil)
index<<i
pos=i+seq.length
end
index
end
substr_positions = Proc.new do |string,substring|
string.enum_for(:scan, substring).map { $~.offset(0)[0] }
end
str="assbasscassmassthatassbass"
seq="ss"
Benchmark.bm do |x|
x.report("with index(): ") do; 1.upto(COUNT) do;
substr.call(seq,str); end end
x.report("with enum_for()/map(): ") do; 1.upto(COUNT) do;
substr_positions.call(str,seq); end end
x.report("with scan(): ") do; 1.upto(COUNT) do; str.scan_p(seq); end
end
end
d-o-ibook-g4:~ d-o$ ruby bs.rb
user system total real
with index(): 10.380000 0.060000 10.440000 ( 10.670129)
with enum_for()/map(): 88.740000 0.910000 89.650000 ( 98.726685)
with scan(): 57.620000 0.590000 58.210000 ( 65.996496)