[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Q: most efficient way to remove duplicate spaces in a string?

Mark Watson

1/9/2009 6:21:00 PM

I don't usually worry too much about efficiency unless runtime
performance becomes an issue so I use a_string.split.join(' ') to
remove extra spaces because the code is short and readable.

This is certainly inefficient. What is the fastest idiom for this
operation?

Thanks,
Mark
13 Answers

Chris Shea

1/9/2009 6:30:00 PM

0

On Jan 9, 11:20 am, Mark Watson <mark.wat...@gmail.com> wrote:
> I don't usually worry too much about efficiency unless runtime
> performance becomes an issue so I use a_string.split.join(' ') to
> remove extra spaces because the code is short and readable.
>
> This is certainly inefficient. What is the fastest idiom for this
> operation?
>
> Thanks,
> Mark

Haven't benchmarked it, but this sounds like a case for regular
expressions.

a_string.gsub!(/ +/, ' ')

HTH,
Chris

pat eyler

1/9/2009 6:48:00 PM

0

On Fri, Jan 9, 2009 at 11:29 AM, Chris Shea <cmshea@gmail.com> wrote:
> On Jan 9, 11:20 am, Mark Watson <mark.wat...@gmail.com> wrote:
>> I don't usually worry too much about efficiency unless runtime
>> performance becomes an issue so I use a_string.split.join(' ') to
>> remove extra spaces because the code is short and readable.
>>
>> This is certainly inefficient. What is the fastest idiom for this
>> operation?
>>
>> Thanks,
>> Mark
>
> Haven't benchmarked it,


require 'benchmark'

n = 1_000_000

Benchmark.bm(10) do |x|
x.report("gsub") { n.times do
"This is a test string.".gsub(/ +/, ' ')
end
}
x.report("gsub!") { n.times do
"This is a test string.".gsub!(/ +/, ' ')
end
}
x.report("split.join") { n.times do
"This is a test string.".split.join(' ')
end
}
end


user system total real
gsub 6.310000 0.080000 6.390000 ( 7.135028)
gsub! 6.300000 0.070000 6.370000 ( 7.064374)
split.join 3.940000 0.070000 4.010000 ( 4.160529)


or, with bmbm

pate@pate:~/scratch$ ./stringcleanup.rb
Rehearsal ----------------------------------------------
gsub 6.290000 0.170000 6.460000 ( 7.956565)
gsub! 6.370000 0.170000 6.540000 ( 8.128309)
split.join 3.990000 0.080000 4.070000 ( 4.832332)
------------------------------------ total: 17.070000sec

user system total real
gsub 6.220000 0.130000 6.350000 ( 7.613061)
gsub! 6.390000 0.080000 6.470000 ( 8.284855)
split.join 3.970000 0.120000 4.090000 ( 5.684406)


Please, benchmark first. It's not that hard:

http://on-ruby.bl.../2008/12/benchmarking-makes-it-b...



> but this sounds like a case for regular
> expressions.
>
> a_string.gsub!(/ +/, ' ')
>
> HTH,
> Chris
>
>



--
thanks,
-pate
-------------------------
Duty makes us do things, Love make us do things well.
http://on-ruby.bl...
http://eldersjournal.bl...

Patrick Doyle

1/9/2009 6:57:00 PM

0

[Note: parts of this message were removed to make it a legal post.]

On Fri, Jan 9, 2009 at 1:24 PM, Mark Watson <mark.watson@gmail.com> wrote:

> I don't usually worry too much about efficiency unless runtime
> performance becomes an issue so I use a_string.split.join(' ') to
> remove extra spaces because the code is short and readable.
>
> This is certainly inefficient. What is the fastest idiom for this
> operation?
>
I was looking at the docs the other day and stumbled across String#squeeze.
You might try benchmarking that.

--wpd

Robert Klemme

1/9/2009 6:58:00 PM

0

On 09.01.2009 19:47, pat eyler wrote:
> On Fri, Jan 9, 2009 at 11:29 AM, Chris Shea <cmshea@gmail.com> wrote:
>> On Jan 9, 11:20 am, Mark Watson <mark.wat...@gmail.com> wrote:
>>> I don't usually worry too much about efficiency unless runtime
>>> performance becomes an issue so I use a_string.split.join(' ') to
>>> remove extra spaces because the code is short and readable.
>>>
>>> This is certainly inefficient. What is the fastest idiom for this
>>> operation?
>>>
>>> Thanks,
>>> Mark
>> Haven't benchmarked it,
>
>
> require 'benchmark'
>
> n = 1_000_000
>
> Benchmark.bm(10) do |x|
> x.report("gsub") { n.times do
> "This is a test string.".gsub(/ +/, ' ')
> end
> }
> x.report("gsub!") { n.times do
> "This is a test string.".gsub!(/ +/, ' ')
> end
> }
> x.report("split.join") { n.times do
> "This is a test string.".split.join(' ')
> end
> }
> end

Note that these behave differently. A more appropriate comparison would
be to use /\s+/ as regular expression for gsub!.

Interesting figures btw. I would have guessed that gsub! is fastest -
live and learn.

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end

Craig Demyanovich

1/9/2009 7:01:00 PM

0

[Note: parts of this message were removed to make it a legal post.]

I was just doing that. :-)

require 'benchmark'

n = 1_000_000

Benchmark.bm(10) do |x|
x.report("gsub") { n.times do
"This is a test string.".gsub(/ +/, ' ')
end
}
x.report("gsub!") { n.times do
"This is a test string.".gsub!(/ +/, ' ')
end
}
x.report("split.join") { n.times do
"This is a test string.".split.join(' ')
end
}
x.report("squeeze.strip") { n.times do
"This is a test string.".squeeze(' ').strip
end
}
end

user system total real
gsub 4.470000 0.040000 4.510000 ( 4.578173)
gsub! 4.390000 0.020000 4.410000 ( 4.468695)
split.join 3.500000 0.020000 3.520000 ( 3.556669)
squeeze.strip 1.980000 0.010000 1.990000 ( 2.003622)

Regards,
Craig

Tim Greer

1/9/2009 7:52:00 PM

0

Craig Demyanovich wrote:

> [Note: parts of this message were removed to make it a legal post.]
>
> I was just doing that. :-)
>
> require 'benchmark'
>
> n = 1_000_000
>
> Benchmark.bm(10) do |x|
> x.report("gsub") { n.times do
> "This is a test string.".gsub(/ +/, ' ')
> end
> }
> x.report("gsub!") { n.times do
> "This is a test string.".gsub!(/ +/, ' ')
> end
> }
> x.report("split.join") { n.times do
> "This is a test string.".split.join(' ')
> end
> }
> x.report("squeeze.strip") { n.times do
> "This is a test string.".squeeze(' ').strip
> end
> }
> end
>
> user system total real
> gsub 4.470000 0.040000 4.510000 ( 4.578173)
> gsub! 4.390000 0.020000 4.410000 ( 4.468695)
> split.join 3.500000 0.020000 3.520000 ( 3.556669)
> squeeze.strip 1.980000 0.010000 1.990000 ( 2.003622)
>
> Regards,
> Craig


The ruby version plays a big role in squeeze.strip, as it's much slower
than split.join with an older version on one of my systems.

Also, try the following variations and see the slight speed increase for
gsub/gsub! (small increase, but interesting to note):

I.e.:

n = 1_000_000

string = "This is a test string."

Benchmark.bm(10) do |x|
x.report("gsub") { n.times do
string.gsub(/ +/, ' ')
end
}
x.report("gsub #2") { n.times do
string.gsub(/ {2,}/, ' ')
end
}
x.report("gsub #3") { n.times do
string.gsub(/ +/, ' ')
end
}

old ruby 1.8.1 on one system:

user system total real
gsub 9.090000 0.000000 9.090000 ( 9.111154)
gsub #2 8.600000 0.000000 8.600000 ( 8.643407)
gsub #3 7.560000 0.000000 7.560000 ( 7.579185)
gsub! 8.060000 0.000000 8.060000 ( 8.061861)
gsub! #2 8.110000 0.000000 8.110000 ( 8.115992)
split.join 4.730000 0.000000 4.730000 ( 4.733248)
squeeze.strip 6.140000 0.000000 6.140000 ( 6.132690)

Ruby 1.8.7 on another system:

user system total real
gsub 3.660000 0.000000 3.660000 ( 3.663209)
gsub #2 3.450000 0.000000 3.450000 ( 3.455171)
gsub #3 3.070000 0.000000 3.070000 ( 3.065939)
gsub! 3.500000 0.000000 3.500000 ( 3.517176)
gsub! #2 3.510000 0.000000 3.510000 ( 3.506881)
split.join 2.550000 0.000000 2.550000 ( 2.560460)
squeeze.strip 1.580000 0.000000 1.580000 ( 1.588717)

I'll see if I can get a chance to upgrade ruby on the older system to
see the results.
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!

Mark Watson

1/9/2009 8:23:00 PM

0

On Jan 9, 11:20 am, Mark Watson <mark.wat...@gmail.com> wrote:
> I don't usually worry too much about efficiency unless runtime
> performance becomes an issue so I use a_string.split.join(' ') to
> remove extra spaces because the code is short and readable.
>
> This is certainly inefficient. What is the fastest idiom for this
> operation?
>
> Thanks,
> Mark

Thanks to everyone who responded!

Best regards,
Mark

Clifford Heath

1/10/2009 1:17:00 AM

0

Robert Klemme wrote:
> Note that these behave differently...
> Interesting figures btw. I would have guessed that gsub! is fastest -
> live and learn

It still might be - the benchmark doesn't run long enough to
compare the GC overhead of making dozens of little strings
that get used once each.

pat eyler

1/10/2009 6:59:00 PM

0

On Fri, Jan 9, 2009 at 6:20 PM, Clifford Heath <no@spam.please.net> wrote:
> Robert Klemme wrote:
>>
>> Note that these behave differently...
>> Interesting figures btw. I would have guessed that gsub! is fastest -
>> live and learn
>
> It still might be - the benchmark doesn't run long enough to
> compare the GC overhead of making dozens of little strings
> that get used once each.
>
>

Is this better?

#!/usr/bin/ruby

require 'benchmark'

n = 1_000_000

Benchmark.bm(10) do |x|
x.report("gsub") { n.times do
a_string = "This is a test string."
a_string.gsub(/ +/, ' ')
end
}
x.report("gsub!") { n.times do
a_string = "This is a test string."
a_string.gsub!(/ +/, ' ')
end
}
x.report("split.join") { n.times do
a_string = "This is a test string."
a_string.split.join(' ')
end
}
x.report("squeeze.strip") { n.times do
a_string = "This is a test string."
a_string.squeeze(' ').strip
end
}
end



$ ./stringcleanup.rb
user system total real
gsub 7.170000 0.140000 7.310000 ( 11.034766)
gsub! 6.900000 0.150000 7.050000 ( 14.113575)
split.join 4.450000 0.110000 4.560000 ( 7.518476)
squeeze.strip 3.180000 0.100000 3.280000 ( 4.943830)


--
thanks,
-pate
-------------------------
Duty makes us do things, Love make us do things well.
http://on-ruby.bl...
http://eldersjournal.bl...

Tim Greer

1/10/2009 8:13:00 PM

0

pat eyler wrote:

> On Fri, Jan 9, 2009 at 6:20 PM, Clifford Heath <no@spam.please.net>
> wrote:
>> Robert Klemme wrote:
>>>
>>> Note that these behave differently...
>>> Interesting figures btw. I would have guessed that gsub! is fastest
>>> - live and learn
>>
>> It still might be - the benchmark doesn't run long enough to
>> compare the GC overhead of making dozens of little strings
>> that get used once each.
>>
>>
>
> Is this better?
>
> #!/usr/bin/ruby
>
> require 'benchmark'
>
> n = 1_000_000
>
> Benchmark.bm(10) do |x|
> x.report("gsub") { n.times do
> a_string = "This is a test string."
> a_string.gsub(/ +/, ' ')
> end
> }
> x.report("gsub!") { n.times do
> a_string = "This is a test string."
> a_string.gsub!(/ +/, ' ')
> end
> }
> x.report("split.join") { n.times do
> a_string = "This is a test string."
> a_string.split.join(' ')
> end
> }
> x.report("squeeze.strip") { n.times do
> a_string = "This is a test string."
> a_string.squeeze(' ').strip
> end
> }
> end
>
>
>
> $ ./stringcleanup.rb
> user system total real
> gsub 7.170000 0.140000 7.310000 ( 11.034766)
> gsub! 6.900000 0.150000 7.050000 ( 14.113575)
> split.join 4.450000 0.110000 4.560000 ( 7.518476)
> squeeze.strip 3.180000 0.100000 3.280000 ( 4.943830)
>
>

For gsub/gsub!, instead of replacing one or more white space with a
white space, speed it up by replacing two or more white space with a
white space. This saves unneeded processing by not replacing single
white space. I.e., instead of gsub(/ +/, ' '), try gsub(/ +/, ' ') or
gsub(/ {2,}/, ' ') and benchmark them (they should be faster). Of
course, it's still not going to be as fast as split.join or
squeeze.strip (at least depending on the version of ruby used, as older
ruby versions may put squeeze.strip markedly slower than split.join.
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!