Robert Klemme
12/28/2004 1:16:00 PM
"David Garamond" <lists@zara.6.isreserved.com> schrieb im Newsbeitrag
news:41D146A1.8070706@zara.6.isreserved.com...
> I use 128bit GUID values a lot, and on my Guid class there's the
> Guid.from_base36 constructor and Guid#to_base36 instance method.
>
> Base36 is a representation of GUID value using the digits "a".."z",
> "0".."9" (in that particular order). Some of the nice properties of this
> representation:
>
> - shorter than the hexdigit representation (25 chars instead of 32/36).
>
> - can be conveniently used as identifiers (variable & function names,
> URL query parameters, table & column names, etc); note that the
> first digit always falls between "a".."p", because 36**25 > 2**128.
>
> - can be conveniently used as filenames (even on Windows, since it
> does not depend on case differences).
>
> The Guid class stores the GUID value as 16-byte string (@val). Here's
> the conversion code mentioned above:
>
> def Guid.from_base36(val)
> val = val.downcase
The general rule is that object creation is quite expensive. So it's
always good to try to avoid it if possible.
You can save the downcase if you add values for lower and upper case
characters to @@rd36 (if it's an array, maybe make it a hash).
> raise ArgumentError unless val =~ /\A[a-z][a-z0-9]{24}\z/
> n = 0
> mult = 1
> val.reverse.each_byte {|c|
> n += @@rd36[c] * mult
> mult *= 36
> }
Maybe this looping is more efficient:
(val.length - 1).downto(0) {|i|
n += @@rd36[val[c]] * mult
mult *= 36
}
> Guid.from_i(n)
Not a performance thingy but you can omit "Guid." here. This is less
error prone if you rename the class.
> end
>
> def to_base36
> self.to_i.to_s(36).tr('0-9a-z', 'a-z0-9').rjust(25, 'a')
tmp = to_i.to_s(36)
tmp.tr!('0-9a-z', 'a-z0-9')
tmp.rjust(25, 'a')
Maybe this can be even further optimized if you can save instance
creations.
Kind regards
robert
> end
>
> Benchmark.measure shows that, on my box, I can do around 2000-2500 of
> roundtrip conversions per second, which is not too bad. But I wonder if
> it can be made more efficient. The Ruby profiler shows the top 4
methods:
>
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 33.59 0.43 0.43 50 8.60 17.60 String#each_byte
> 14.06 0.61 0.18 1550 0.12 0.17 Fixnum#*
> 9.38 0.73 0.12 1900 0.06 0.06 Bignum#*
> 8.59 0.84 0.11 50 2.20 3.00 Guid#to_i
>
> which is kind of disappointing since I was hoping to still be able to
> store GUID values as 16-byte strings, for compactness. Here's the
> complete code (91 lines, hope it's not too long...)
>
> ===================================================================
> class Guid
> @@d36 = ('a'..'z').to_a + ('0'..'9').to_a
> @@rd36 = {}
> @@d36.each_with_index {|d, i| @@rd36[d[0]] = i}
>
> attr_reader :val
>
> def initialize(val=nil)
> if val
> raise ArgumentError unless val.length==16
> else
> val = (1..16).collect{|c| rand(256).chr}.join
> end
> @val = val
> end
>
> def to_s
> self.to_hex
> end
>
> def ==(other)
> @val == other.val
> end
>
> def Guid.from_hex(val)
> h = '[0-9A-Fa-f]'
> raise ArgumentError unless
> val =~ /\A#{h}{8}-#{h}{4}-#{h}{4}-#{h}{4}-#{h}{12}\z/
> val.gsub! /-/, ''
> Guid.new([val].pack('H32'))
> end
>
> def to_hex
> @val.unpack('H8H4H4H4H12').join '-'
> end
>
> def Guid.from_i(val)
> Guid.new([
> (val & 0xffffffff000000000000000000000000) >> 96,
> (val & 0x00000000ffffffff0000000000000000) >> 64,
> (val & 0x0000000000000000ffffffff00000000) >> 32,
> (val & 0x000000000000000000000000ffffffff)
> ].pack('NNNN'))
> end
>
> def to_i
> (@val[ 0 .. 3].unpack('N')[0] << 96) +
> (@val[ 4 .. 7].unpack('N')[0] << 64) +
> (@val[ 8 .. 11].unpack('N')[0] << 32) +
> (@val[12 .. 15].unpack('N')[0])
> end
>
> def Guid.from_base36(val)
> val = val.downcase
> raise ArgumentError unless val =~ /\A[a-z][a-z0-9]{24}\z/
> n = 0
> mult = 1
> val.reverse.each_byte {|c|
> n += @@rd36[c] * mult
> mult *= 36
> }
> Guid.from_i(n)
> end
>
> def to_base36
> self.to_i.to_s(36).tr('0-9a-z', 'a-z0-9').rjust(25, 'a')
> end
> end
>
> guids = [
> #'00000000-0000-0000-0000-000000000000',
> #'ffffffff-ffff-ffff-ffff-ffffffffffff',
> #'00000000-0000-0000-0000-000000000001',
> #'10000000-0000-0000-0000-000000000000',
> '7d77a27f-542c-4edb-9642-f0e324ae23a2',
> 'c44e6638-ed47-4ce9-9bb6-ba41bca2e535',
> 'aae9946e-64b2-4628-8f57-d0da26d29c86',
> '6cf2eba1-2f9d-463d-8538-6396f00b3f68',
> '6c518ced-1d56-46f9-af24-dd82726502ef',
> ].collect {|h| Guid.from_hex(h)}
>
> require 'benchmark'
>
> puts Benchmark.measure {
> 1000.times {
> guids.each {|g|
> raise SystemError unless g == Guid.from_base36(g.to_base36)
> }
> }
> }
> ===================================================================
>
> Regards,
> dave
>
>