Adam Shelly
12/15/2005 5:32:00 AM
On 12/14/05, Steve [RubyTalk] <steve_rubytalk@shic.co.uk> wrote:
>
> Given that I only want to compute the offsets once, an obvious solution
> would be to construct an Array of String - each element representing a
> sub-string of the original... but this would double memory use. What
> would be the best way to avoid duplicating the character sequences and
> causing run-time bloat?
You don't need to create the substrings until you need them: Here's a
test which only allocates memory for the substrings as needed. As far
as I can tell, the substrings are released back to the GC as soon as
you are done with them.
class IndexedString < String
def initialize s, &indexer
super s
@offsets = yield self
self
end
def substring n
(0...(@offsets.size-1)) === n ?
self[@offsets[n]...@offsets[n+1]] :
nil
end
end
def build_index s
indexes = [0]
while (n=s.index('.',indexes.last+1)) do indexes << n+1 end
indexes << s.length+1
end
s = IndexedString.new( (("TESTINGTESTING!!"*1024)+".")*100) do |str|
build_index(str)
end #= +1600K
s.substring 10; #+16K
s.substring 95; #+16K
> By corollary, if I had a large number of Strings, what would be the most
> memory efficient way to represent their concatenation? If I had n mK
> stings, would I need another n*mK contiguous block of memory to
> represent their concatenation?
Not necessarily: You can build a class which takes an array of
strings and returns
"superstrings" only as needed. Here are my passing test cases, but
the class still needs a little work before I'll post it:
def test_MegaString
small_strings = %w( this is a collection of small strings)
mega = MegaString.new(small_strings)
assert_equal("t",mega[0])
assert_equal("this",mega[0,4])
assert_equal("hisisaco",mega[1,8])
assert_equal("ollect",mega[8,6])
assert_equal("a", mega[6...7])
assert_equal("col", mega[7..10])
assert_equal("lstrings", mega[23,-1])
assert_equal("lstrings", mega[23,230])
assert_equal("thisisacollectionofsmallstrings", mega.to_s)
end
-Adam