[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[QUIZ] Twisting a Rope (#137

James Gray

8/31/2007 1:20:00 PM

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rub...

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by John Miller

This week's task is to implement the Rope data structure as a Ruby class. This
topic comes out of the ICFP programming competition
(http://www.icfpco...) which had competitors manipulating a 7.5 million
character string this year.

What is a Rope:

You may not realize it, but for many changes the content to a String, Ruby
creates a new copy of the original with the modifications applied. For small
strings that are created once and read often this is actually a very efficient
way to do thing, but what happens when the string starts to get long, and is
undergoing a lot of changes? First, the program will spend more and more of its
processing cycles just copying bits around. Second, the garbage collector will
be called more and more often to pick up the little stringy scraps you've left
all over the memory.

Ropes (the name is a pun for a heavy duty string) are tree structures where a
node represents the concatenation of its left branch with its right, and leaves
are flat strings. (This is a little sloppy. A rope may contain shared subtrees,
and is thus really a directed acyclic graph, where the out-edges of each vertex
are ordered. We will continue to be sloppy.) E.g. To prepend Text A to Text B,
one creates a Node (call it N1) with A as its left branch and B as its right.
To further append Text C create a new Node N2 with its left branch pointing to
N1 and its right to C. Easy, right? To find out more see Boehm, Atkinson and
Plass "Ropes: an Alternative to Strings" at:

http://rubyurl...

The task comes in three parts, each increasing in difficulty:

Part one:

Create a Rope class that can do the following:

a. 'append' or 'prepend' a String or another Rope
(alias the << operator to the append function)
b. Return the fully concatenated text with 'to_s'
c. define 'slice' to call to_s.slice
d. provide a 'length' method

Part two:

Add the following:

a. Add the ability to 'index' a single character given a 0-based offset
from the beginning of the string.
b. Add the ability to 'shift' a single character from the front of a Rope.
(Remove and return the character)
c. Add your own 'slice' method that returns a String. Implement as many of
the String method's forms as possible. To run the example code this
function will only need to understand the slice(offset,length) form.
Major Bonus for Regex and Substring forms.
d. "Balance" the tree with a 'normalize' method.
(see Boehm, Atkinson and Plass 1319 Rebalancing)

Part three: (bonus)

Add the following:

a. Change the 'slice' method to return a Rope. Ideally this method should
do as little string copying as possible. (Doing this will well
dramatically increase the speed of the example code)
b. Allow 'shift' to optionally accept an integer number of characters to
remove and return.
c. modify the '<<' operator so that can efficiently append a few
characters at a time. (see Boehm, Atkinson and Plass 1318 para. 4)
d. *Major Bonus* Add the ability to append and prepend IO classes in a
lazy fashion. (see Boehm, Atkinson and Plass 1318 para. 2)

The following code may help you explore how efficient your code is and show
where Ropes are useful. `ruby -r /path/to/your/rope/class this_script.rb Rope`
will run the test with your code. Run the script without arguments to see how
well String does. Also play around with the SIZE and CHUNKS constants to get a
feel for how they affect performance.

require 'benchmark'

#This code make a String/Rope of CHUNCKS chunks of text
#each chunck is SIZE bytes long. Each chunck starts with
#an 8 byte number. Initially the chuncks are shuffled the
#qsort method sorts them into ascending order.
#
#pass the name of the class to use as a parameter
#ruby -r rope.rb this_file Rope

puts 'preparing data...'
TextClass = Object.const_get(ARGV.shift || :String)

def qsort(text)
return TextClass.new if text.length == 0
pivot = text.slice(0,8).to_s.to_i
less = TextClass.new
more = TextClass.new
offset = 8+SIZE
while (offset < text.length)
i = text.slice(offset,8).to_s.to_i
(i < pivot ? less : more) << text.slice(offset,8+SIZE)
offset = offset + 8+SIZE
end
print "*"
return qsort(less) << text.slice(0,8+SIZE) << qsort(more)
end

SIZE = 512 * 1024
CHUNCKS = 128
CHARS = %w[R O P E]
data = TextClass.new
bulk_string =
TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join)
puts 'Building Text...'
build = Benchmark.measure do
(0..CHUNCKS).sort_by { rand }.each do |n|
data<< sprintf("%08i",n) << bulk_string
end
data.normalize if data.respond_to? :normalize
end
GC.start
sort = Benchmark.measure do
puts "Sorting Text..."
qsort(data)
puts"\nEND"
end

puts "Build: #{build}Sort: #{sort}"

29 Answers

John Miller

9/1/2007 5:20:00 AM

0

My apologies,

The first line of the quiz should read:
"You may not realize it, but for many changes to the content of a
String,"

I am really looking forward to seeing some ingenious solutions to this
problem. The concept seemed very simple to me but my first attempt kept
becoming mired down in edge cases. I hope everyone has a good weekend
and a good Labor Day weekend to those in the US (plenty of time to work
on this weeks quiz ;)

John Miller


--
Posted via http://www.ruby-....

James Gray

9/1/2007 6:48:00 PM

0

On Sep 1, 2007, at 12:20 AM, John Miller wrote:

> The first line of the quiz should read:
> "You may not realize it, but for many changes to the content of a
> String,"

I've fixed this on the web site. Thanks for pointing it out.

James Edward Gray II

Carl Porth

9/2/2007 2:26:00 PM

0

I've done parts one and two so far. I'll try to add more in the next
few days.

For simplicity and speed, append and prepend don't modify the
receiver.

If anyone has any questions about my code, I'll be happy to answer.

Carl

require "rubygems"
require "facets"
require "kernel/with"
require "symbol/to_proc"

class String
def shift
return nil if empty?
returning self[0].chr do
self[0] = ""
end
end
end

class Rope
attr_reader :left, :right, :length

def Rope.new(*args)
if args.size <= 2
super
else # build balanced tree
mid = args.size / 2
args[mid,2] = Rope.new(*args[mid,2])
Rope.new(*args)
end
end

def initialize(left="",right="")
@left, @right = left, right
@length = @left.length + @right.length
end

def replace(rope)
initialize(rope.left,rope.right)
self
end

# clean out empty strings and rebuild tree
def normalize
Rope.new(*flat_strings.reject(&:empty?))
end

def to_s
flat_strings.join('')
end

def append(str)
Rope.new(self,str)
end
alias_method :<<, :append

def prepend(str)
Rope.new(str,self)
end

def slice(start,length=@length-start)
if start > left.length # right
right.slice(start-left.length,length-left.length)
elsif left.length < length # left and right
left.slice(start,left.length) +
right.slice(start-left.length,length-left.length)
else # left
left.slice(start,length)
end
end

def shift
if letter = left.shift || right.shift
@length -= 1
letter
else
nil
end
end

def index(letter,start=0)
if start > left.length # right
left.length + right.index(letter,start - left.length)
else # left
left.index(letter,start) || left.length + right.index(letter)
end
rescue
nil
end

# TODO implement rest of methods
def method_missing(method, *args, &block)
result = to_s.send(method, *args, &block)
if result.is_a?(String)
if method.to_s =~ /!$/
replace(Rope.new(result))
else
Rope.new(result)
end
else
result
end
end

protected

# traverse the tree
def traverse(&block)
@left.is_a?(Rope) ? @left.traverse(&block) : block.call(@left)
@right.is_a?(Rope) ? @right.traverse(&block) : block.call(@right)
end

# collect all the flat strings
def flat_strings
returning [] do |ary|
traverse { |str| ary << str }
end
end

end


On Aug 31, 6:19 am, Ruby Quiz <ja...@grayproductions.net> wrote:
> The three rules of Ruby Quiz:
>
> 1. Please do not post any solutions or spoiler discussion for this quiz until
> 48 hours have passed from the time on this message.
>
> 2. Support Ruby Quiz by submitting ideas as often as you can:
>
> http://www.rub...
>
> 3. Enjoy!
>
> Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
> on Ruby Talk follow the discussion. Please reply to the original quiz message,
> if you can.
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=
>
> by John Miller
>
> This week's task is to implement the Rope data structure as a Ruby class. This
> topic comes out of the ICFP programming competition
> (http://www.icfpco...) which had competitors manipulating a 7.5 million
> character string this year.
>
> What is a Rope:
>
> You may not realize it, but for many changes the content to a String, Ruby
> creates a new copy of the original with the modifications applied. For small
> strings that are created once and read often this is actually a very efficient
> way to do thing, but what happens when the string starts to get long, and is
> undergoing a lot of changes? First, the program will spend more and more of its
> processing cycles just copying bits around. Second, the garbage collector will
> be called more and more often to pick up the little stringy scraps you've left
> all over the memory.
>
> Ropes (the name is a pun for a heavy duty string) are tree structures where a
> node represents the concatenation of its left branch with its right, and leaves
> are flat strings. (This is a little sloppy. A rope may contain shared subtrees,
> and is thus really a directed acyclic graph, where the out-edges of each vertex
> are ordered. We will continue to be sloppy.) E.g. To prepend Text A to Text B,
> one creates a Node (call it N1) with A as its left branch and B as its right.
> To further append Text C create a new Node N2 with its left branch pointing to
> N1 and its right to C. Easy, right? To find out more see Boehm, Atkinson and
> Plass "Ropes: an Alternative to Strings" at:
>
> http://rubyurl...
>
> The task comes in three parts, each increasing in difficulty:
>
> Part one:
>
> Create a Rope class that can do the following:
>
> a. 'append' or 'prepend' a String or another Rope
> (alias the << operator to the append function)
> b. Return the fully concatenated text with 'to_s'
> c. define 'slice' to call to_s.slice
> d. provide a 'length' method
>
> Part two:
>
> Add the following:
>
> a. Add the ability to 'index' a single character given a 0-based offset
> from the beginning of the string.
> b. Add the ability to 'shift' a single character from the front of a Rope.
> (Remove and return the character)
> c. Add your own 'slice' method that returns a String. Implement as many of
> the String method's forms as possible. To run the example code this
> function will only need to understand the slice(offset,length) form.
> Major Bonus for Regex and Substring forms.
> d. "Balance" the tree with a 'normalize' method.
> (see Boehm, Atkinson and Plass 1319 Rebalancing)
>
> Part three: (bonus)
>
> Add the following:
>
> a. Change the 'slice' method to return a Rope. Ideally this method should
> do as little string copying as possible. (Doing this will well
> dramatically increase the speed of the example code)
> b. Allow 'shift' to optionally accept an integer number of characters to
> remove and return.
> c. modify the '<<' operator so that can efficiently append a few
> characters at a time. (see Boehm, Atkinson and Plass 1318 para. 4)
> d. *Major Bonus* Add the ability to append and prepend IO classes in a
> lazy fashion. (see Boehm, Atkinson and Plass 1318 para. 2)
>
> The following code may help you explore how efficient your code is and show
> where Ropes are useful. `ruby -r /path/to/your/rope/class this_script.rb Rope`
> will run the test with your code. Run the script without arguments to see how
> well String does. Also play around with the SIZE and CHUNKS constants to get a
> feel for how they affect performance.
>
> require 'benchmark'
>
> #This code make a String/Rope of CHUNCKS chunks of text
> #each chunck is SIZE bytes long. Each chunck starts with
> #an 8 byte number. Initially the chuncks are shuffled the
> #qsort method sorts them into ascending order.
> #
> #pass the name of the class to use as a parameter
> #ruby -r rope.rb this_file Rope
>
> puts 'preparing data...'
> TextClass = Object.const_get(ARGV.shift || :String)
>
> def qsort(text)
> return TextClass.new if text.length == 0
> pivot = text.slice(0,8).to_s.to_i
> less = TextClass.new
> more = TextClass.new
> offset = 8+SIZE
> while (offset < text.length)
> i = text.slice(offset,8).to_s.to_i
> (i < pivot ? less : more) << text.slice(offset,8+SIZE)
> offset = offset + 8+SIZE
> end
> print "*"
> return qsort(less) << text.slice(0,8+SIZE) << qsort(more)
> end
>
> SIZE = 512 * 1024
> CHUNCKS = 128
> CHARS = %w[R O P E]
> data = TextClass.new
> bulk_string =
> TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join)
> puts 'Building Text...'
> build = Benchmark.measure do
> (0..CHUNCKS).sort_by { rand }.each do |n|
> data<< sprintf("%08i",n) << bulk_string
> end
> data.normalize if data.respond_to? :normalize
> end
> GC.start
> sort = Benchmark.measure do
> puts "Sorting Text..."
> qsort(data)
> puts"\nEND"
> end
>
> puts "Build: #{build}Sort: #{sort}"


Carl Porth

9/2/2007 5:00:00 PM

0

I've modified my Rope so it runs with the benchmark (and fixed some
bugs).

http://pastie.cabo...

with the included benchmark:

badcarl@navi> ruby -r lib/rope.rb benchmark.rb String
Build: 0.150000 0.080000 0.230000 ( 0.234209)
Sort: 1.700000 1.850000 3.550000 ( 3.613295)

badcarl@navi> ruby -r lib/rope.rb benchmark.rb Rope
Build: 0.000000 0.000000 0.000000 ( 0.009324)
Sort: 0.280000 0.080000 0.360000 ( 0.372736)

I'm getting around 10x speedup on sorting.


On Sep 2, 7:26 am, Carl Porth <badc...@gmail.com> wrote:
> I've done parts one and two so far. I'll try to add more in the next
> few days.
>
> For simplicity and speed, append and prepend don't modify the
> receiver.
>
> If anyone has any questions about my code, I'll be happy to answer.
>
> Carl
>
> require "rubygems"
> require "facets"
> require "kernel/with"
> require "symbol/to_proc"
>
> class String
> def shift
> return nil if empty?
> returning self[0].chr do
> self[0] = ""
> end
> end
> end
>
> class Rope
> attr_reader :left, :right, :length
>
> def Rope.new(*args)
> if args.size <= 2
> super
> else # build balanced tree
> mid = args.size / 2
> args[mid,2] = Rope.new(*args[mid,2])
> Rope.new(*args)
> end
> end
>
> def initialize(left="",right="")
> @left, @right = left, right
> @length = @left.length + @right.length
> end
>
> def replace(rope)
> initialize(rope.left,rope.right)
> self
> end
>
> # clean out empty strings and rebuild tree
> def normalize
> Rope.new(*flat_strings.reject(&:empty?))
> end
>
> def to_s
> flat_strings.join('')
> end
>
> def append(str)
> Rope.new(self,str)
> end
> alias_method :<<, :append
>
> def prepend(str)
> Rope.new(str,self)
> end
>
> def slice(start,length=@length-start)
> if start > left.length # right
> right.slice(start-left.length,length-left.length)
> elsif left.length < length # left and right
> left.slice(start,left.length) +
> right.slice(start-left.length,length-left.length)
> else # left
> left.slice(start,length)
> end
> end
>
> def shift
> if letter = left.shift || right.shift
> @length -= 1
> letter
> else
> nil
> end
> end
>
> def index(letter,start=0)
> if start > left.length # right
> left.length + right.index(letter,start - left.length)
> else # left
> left.index(letter,start) || left.length + right.index(letter)
> end
> rescue
> nil
> end
>
> # TODO implement rest of methods
> def method_missing(method, *args, &block)
> result = to_s.send(method, *args, &block)
> if result.is_a?(String)
> if method.to_s =~ /!$/
> replace(Rope.new(result))
> else
> Rope.new(result)
> end
> else
> result
> end
> end
>
> protected
>
> # traverse the tree
> def traverse(&block)
> @left.is_a?(Rope) ? @left.traverse(&block) : block.call(@left)
> @right.is_a?(Rope) ? @right.traverse(&block) : block.call(@right)
> end
>
> # collect all the flat strings
> def flat_strings
> returning [] do |ary|
> traverse { |str| ary << str }
> end
> end
>
> end
>
> On Aug 31, 6:19 am, Ruby Quiz <ja...@grayproductions.net> wrote:
>
> > The three rules of Ruby Quiz:
>
> > 1. Please do not post any solutions or spoiler discussion for this quiz until
> > 48 hours have passed from the time on this message.
>
> > 2. Support Ruby Quiz by submitting ideas as often as you can:
>
> >http://www.rub...
>
> > 3. Enjoy!
>
> > Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
> > on Ruby Talk follow the discussion. Please reply to the original quiz message,
> > if you can.
>
> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=
>
> > by John Miller
>
> > This week's task is to implement the Rope data structure as a Ruby class. This
> > topic comes out of the ICFP programming competition
> > (http://www.icfpco...) which had competitors manipulating a 7.5 million
> > character string this year.
>
> > What is a Rope:
>
> > You may not realize it, but for many changes the content to a String, Ruby
> > creates a new copy of the original with the modifications applied. For small
> > strings that are created once and read often this is actually a very efficient
> > way to do thing, but what happens when the string starts to get long, and is
> > undergoing a lot of changes? First, the program will spend more and more of its
> > processing cycles just copying bits around. Second, the garbage collector will
> > be called more and more often to pick up the little stringy scraps you've left
> > all over the memory.
>
> > Ropes (the name is a pun for a heavy duty string) are tree structures where a
> > node represents the concatenation of its left branch with its right, and leaves
> > are flat strings. (This is a little sloppy. A rope may contain shared subtrees,
> > and is thus really a directed acyclic graph, where the out-edges of each vertex
> > are ordered. We will continue to be sloppy.) E.g. To prepend Text A to Text B,
> > one creates a Node (call it N1) with A as its left branch and B as its right.
> > To further append Text C create a new Node N2 with its left branch pointing to
> > N1 and its right to C. Easy, right? To find out more see Boehm, Atkinson and
> > Plass "Ropes: an Alternative to Strings" at:
>
> > http://rubyurl...
>
> > The task comes in three parts, each increasing in difficulty:
>
> > Part one:
>
> > Create a Rope class that can do the following:
>
> > a. 'append' or 'prepend' a String or another Rope
> > (alias the << operator to the append function)
> > b. Return the fully concatenated text with 'to_s'
> > c. define 'slice' to call to_s.slice
> > d. provide a 'length' method
>
> > Part two:
>
> > Add the following:
>
> > a. Add the ability to 'index' a single character given a 0-based offset
> > from the beginning of the string.
> > b. Add the ability to 'shift' a single character from the front of a Rope.
> > (Remove and return the character)
> > c. Add your own 'slice' method that returns a String. Implement as many of
> > the String method's forms as possible. To run the example code this
> > function will only need to understand the slice(offset,length) form.
> > Major Bonus for Regex and Substring forms.
> > d. "Balance" the tree with a 'normalize' method.
> > (see Boehm, Atkinson and Plass 1319 Rebalancing)
>
> > Part three: (bonus)
>
> > Add the following:
>
> > a. Change the 'slice' method to return a Rope. Ideally this method should
> > do as little string copying as possible. (Doing this will well
> > dramatically increase the speed of the example code)
> > b. Allow 'shift' to optionally accept an integer number of characters to
> > remove and return.
> > c. modify the '<<' operator so that can efficiently append a few
> > characters at a time. (see Boehm, Atkinson and Plass 1318 para. 4)
> > d. *Major Bonus* Add the ability to append and prepend IO classes in a
> > lazy fashion. (see Boehm, Atkinson and Plass 1318 para. 2)
>
> > The following code may help you explore how efficient your code is and show
> > where Ropes are useful. `ruby -r /path/to/your/rope/class this_script.rb Rope`
> > will run the test with your code. Run the script without arguments to see how
> > well String does. Also play around with the SIZE and CHUNKS constants to get a
> > feel for how they affect performance.
>
> > require 'benchmark'
>
> > #This code make a String/Rope of CHUNCKS chunks of text
> > #each chunck is SIZE bytes long. Each chunck starts with
> > #an 8 byte number. Initially the chuncks are shuffled the
> > #qsort method sorts them into ascending order.
> > #
> > #pass the name of the class to use as a parameter
> > #ruby -r rope.rb this_file Rope
>
> > puts 'preparing data...'
> > TextClass = Object.const_get(ARGV.shift || :String)
>
> > def qsort(text)
> > return TextClass.new if text.length == 0
> > pivot = text.slice(0,8).to_s.to_i
> > less = TextClass.new
> > more = TextClass.new
> > offset = 8+SIZE
> > while (offset < text.length)
> > i = text.slice(offset,8).to_s.to_i
> > (i < pivot ? less : more) << text.slice(offset,8+SIZE)
> > offset = offset + 8+SIZE
> > end
> > print "*"
> > return qsort(less) << text.slice(0,8+SIZE) << qsort(more)
> > end
>
> > SIZE = 512 * 1024
> > CHUNCKS = 128
> > CHARS = %w[R O P E]
> > data = TextClass.new
> > bulk_string =
> > TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join)
> > puts 'Building Text...'
> > build = Benchmark.measure do
> > (0..CHUNCKS).sort_by { rand }.each do |n|
> > data<< sprintf("%08i",n) << bulk_string
> > end
> > data.normalize if data.respond_to? :normalize
> > end
> > GC.start
> > sort = Benchmark.measure do
> > puts "Sorting Text..."
> > qsort(data)
> > puts"\nEND"
> > end
>
> > puts "Build: #{build}Sort: #{sort}"


Eric Mahurin

9/3/2007 6:51:00 AM

0

On 8/31/07, Ruby Quiz <james@grayproductions.net> wrote:
> This week's task is to implement the Rope data structure as a Ruby class.

My implementation is attached along with a modified test. I made a
few changes to what was asked, but this also gave more flexibility.
Here is what I did:

* Just as the implementation in the paper referenced in this quiz, my
implementation gives immutable ropes. #<< is non-destructive, so I
had to change the test to do <<= instead of just <<. Also, I didn't
implement #shift, because it must be destructive. The same
functionality can be achieved with 2 calls to #slice (one could be #at
if you only need to shift one element). There are very good reasons
to make ropes immutable. I'd imagine almost every other
implementation with modifiable ropes will fail when someone starts
modifying a sub-rope that is shared. You'd really need a good COW
(copy-on-write) scheme to allow both transparent sharing and
modification. I didn't see that it was worth the
effort/complexity/risk. I chose the simple functional programming
approach (immutable objects).

* I chose to automatically balance the ropes during concatenation (no
normalize). I used the same tree rotations that are used with AVL
trees. Another option could be to treat these as red-black trees
which might save on some rotations. One reason I automatically
balanced is that it simplifies the interface. The user of the API
doesn't have to worry about when to normalize. A second reason is
that every other rope operation is O(log(n)), so there probably isn't
much benefit in making only concatenation O(1).

* I don't allow ropes to directly use Strings. Instead, a StringRope
is used as a proxy for a String. To use a String directly, I would
have had to check the class, use respond_to, modify String to look
like a rope, etc. Instead, I just went with the pure duck-typing
approach and made multiple Rope-like classes that use different types
of data. My basis for these is the class ArrayRope. There is no
reason why a rope data-structure can't be used with any sequence of
objects instead of just characters. ArrayRope takes an Array-like
object. An rope built out of these is to Array as a conventional rope
is to String. I also added an IORope class to lazily work with files.
Using IORope you could make a text editor that didn't have to read
the whole file in at once. There is no reason you can't mix and match
any of these leaf rope classes (depth 0) within a rope tree.

* #each iterates through elements (i.e. characters) in the rope. It
annoys me that String#each (and IO#each for that matter) iterates over
lines - it should be characters (not necessarily bytes). All of the
Enumerable methods are accessible too.

* I used #at instead of #index because #index is used for searching in
String/Array. Array#at is an exact match.

The main thing I didn't do was implement any regex stuff. I don't see
how this is doable since all of the regex methods are completely tied
to String (not duck-typed). You'd have to convert the whole rope to
string to do anything (which defeats the purpose of the rope).

Eric Mahurin

9/3/2007 7:11:00 AM

0

On 9/3/07, Eric Mahurin <eric.mahurin@gmail.com> wrote:
> On 8/31/07, Ruby Quiz <james@grayproductions.net> wrote:
> > This week's task is to implement the Rope data structure as a Ruby class.
>
> My implementation is attached along with a modified test. I made a
> few changes to what was asked, but this also gave more flexibility.

Forgot about the results. Here's what I found on my machine:

String
Build: 0.120000 0.080000 0.200000 ( 0.206264)
Sort: 1.680000 0.420000 2.100000 ( 2.112934)
VmPeak: 1192112 kB
VmSize: 493708 kB

StringRope
Build: 0.010000 0.000000 0.010000 ( 0.014414)
Sort: 0.150000 0.020000 0.170000 ( 0.176734)
VmPeak: 38940 kB
VmSize: 37920 kB

so, 20X faster for build and 12.4X faster for sort. Memory looks to
be 10-30X smaller. I ran the build and sort 8 times during the
benchmark and took the min. The memory kept increasing for the String
runs for each of these 8 iterations which doesn't make sense.

Mauricio Fernández

9/3/2007 8:53:00 AM

0

On Fri, Aug 31, 2007 at 10:19:54PM +0900, Ruby Quiz wrote:
> by John Miller
>
> This week's task is to implement the Rope data structure as a Ruby class. This
> topic comes out of the ICFP programming competition
> (http://www.icfpco...) which had competitors manipulating a 7.5 million
> character string this year.

I happened to have implemented ropes in OCaml recently, so I generated a Ruby
extension using rocaml to see how well it would perform.

Without further ado, here are the results I'm getting for SIZE = 512 * 1024,
CHUNKS = 512:

$ time ruby -r himadri_choudhury.rb bm.rb Rope
Build: 0.130000 0.000000 0.130000 ( 0.129476)
Sort: 10.340000 0.050000 10.390000 ( 10.648223)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
Build: 0.020000 0.000000 0.020000 ( 0.018946)
Sort: 0.100000 0.000000 0.100000 ( 0.108499)

$ ruby eric_mahurin.rb StringRope
[...]
Build: 0.060000 0.000000 0.060000 ( 0.057299)
Sort: 0.870000 0.000000 0.870000 ( 0.896493)

For SIZE = 1024, CHUNKS = 16384:

$ ruby eric_mahurin.rb StringRope
[...]
Build: 3.470000 0.040000 3.510000 ( 3.588875)
Sort: 89.110000 0.700000 89.810000 ( 92.179962)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
[...]
Build: 0.360000 0.000000 0.360000 ( 0.378352)
Sort: 3.940000 0.040000 3.980000 ( 4.079140)

At that point the pure Ruby rope is taking over 6 times more memory than
the OCaml one. I ascribe this to iv_tbls being very heavy and to memory
fragmentation.

I benchmarked Himadri's implementation first and was surprised by the
exceedingly large speed difference --- I expected one, not two orders of
magnitude for this code, as there's enough Ruby code in common in qsort to
mask the speed gains in the rope operations. However, Eric's solution proved
that it was just due to a slow Ruby implementation.

Here's the interface definition (extconf.rb):


EXT_NAME = "ocaml_rope"
OCAML_PACKAGES = %w[]
CAML_LIBS = %w[]
CAML_OBJS = %w[]
CAML_FLAGS = ""
CAML_INCLUDES = []

require 'rocaml'

Interface.generate("ocaml_rope") do |iface|
def_class("Rope", :under => "OCaml") do |c|
rope = c.abstract_type

fun "empty", UNIT => rope, :as => "new_empty"
fun "of_string", STRING => rope, :as => "new_from_string"

method "sub", [rope, INT, INT] => rope, :as => "slice"
method "concat", [rope, rope] => rope
method "length", rope => INT
method "get", [rope, INT] => INT
method "to_string", rope => STRING, :as => "to_s"
end
end

require 'rocaml_extconf'


As you can see, OCaml::Rope is purely functional, and the interface differs a
bit from that expected by bm.rb (a modified version that works with immutable
ropes is attached), so I adapted it with the following ocaml_rope.rb, which
also loads the extension:


module OCaml # Rope will be placed in this module
end

require "ocaml_rope.so"

module OCaml
class Rope
def self.new(str = "")
case str
when String; new_from_string str
when Rope; str
when ""; new_empty
else new_from_string(str.to_str) rescue new_from_string(str.to_s)
end
end

def prepend(rope)
rope.append(self)
end

alias_method :append, :concat
alias_method :<<, :append
end
end


The OCaml code is attached, in case anybody wants to look at it.
Incidentally, it weighs a bit under 220 lines, which is also the amount taken
by Himadri's and Eric's solutions. Unlike them, rope.ml features O(1)
concatenation for small elements; this accounts for a large part of the code
and the complexity of some patterns. O(1) concatenation doesn't really affect
performance in the use case exerted by bm.rb anyway.

--
Mauricio Fernandez - http://eige...

Gustav Munkby

9/3/2007 9:43:00 AM

0

On 8/31/07, Ruby Quiz <james@grayproductions.net> wrote:
> by John Miller
>
> This week's task is to implement the Rope data structure as a Ruby class. This
> topic comes out of the ICFP programming competition
> (http://www.icfpco...) which had competitors manipulating a 7.5 million
> character string this year.

My solution deviates slightly from the problem specification.
The most important difference is that instead of implementing
a binary tree to store the strings, they are all stored in an
array instead.

The class itself is quite long, since I wanted to implement
many of the methods of the built-in String class. Some of the
methods will require significant work to actually implement.
Most notably the Regexp-related methods, since there is no
way to instruct the Regexp matcher to ask for more characters
once it has reached the end of a string.

Actually, all String methods are implemented, by implementing
method missing and delegating to the composed string if the
string class can handle the method. After delegating to the
string class, we convert any String result to a new rope and
we also make sure to replace our content by the string we
delegated to, to make sure that destructive methods works as
expected.

In fact, we replace the content of our rope as soon as to_s
is called. since the reason for lazy concatenation is to
avoid the concatenation cost, we can just as well keep the
concatenated string when we already had to pay the price.

The benchmark results are:

# String
Build: 0.170000 0.150000 0.320000 ( 0.324341)
Sort: 3.470000 3.630000 7.100000 ( 7.126741)

# ArrayRope
Build: 0.010000 0.010000 0.020000 ( 0.009744)
Sort: 0.130000 0.000000 0.130000 ( 0.138330)

# ArrayRope (with dup)
Build: 0.240000 0.160000 0.400000 ( 0.402201)
Sort: 0.160000 0.000000 0.160000 ( 0.163022)

For the unprotected case, sorting was ~52 times faster,
and building was ~33 times faster.

However, since the string class is not immutable, there is a
slight problem with this approach. The strings could added
to the rope could be modified "under the hood". We can easily
protect against that by making copies of the strings when we
add them to the rope. Since the built-in String shares the
actual data between the two instances (until they change),
this is not so much of a memory issue as one could expect.

By adding dup (in initialize/append/prepend) we end up with
the following times, which trades of some of our speed/memory
for a bit of safety. (This is actually the submitted solution)

Compared with the string approach, building is now (for obvious
reasons) slower than the String, but only about 25%.
Sorting is still much faster than the String case, namely ~44
times as fast.

!g

Mauricio Fernández

9/3/2007 10:14:00 AM

0

On Mon, Sep 03, 2007 at 05:52:31PM +0900, Mauricio Fernandez wrote:
> I happened to have implemented ropes in OCaml recently, so I generated a Ruby
> extension using rocaml to see how well it would perform.
>
[...]
> For SIZE = 1024, CHUNKS = 16384:
>
> $ ruby eric_mahurin.rb StringRope
> [...]
> Build: 3.470000 0.040000 3.510000 ( 3.588875)
> Sort: 89.110000 0.700000 89.810000 ( 92.179962)
>
> $ time ruby -rocaml_rope bm.rb OCaml::Rope
> [...]
> Build: 0.360000 0.000000 0.360000 ( 0.378352)
> Sort: 3.940000 0.040000 3.980000 ( 4.079140)

Also for laughs, playing with the GC parameters and with a qsort implemented
in OCaml:

$ OCAMLRUNPARAM=s=512k ruby -rocaml_rope bm.rb OCaml::Rope
[...]
Build: 0.290000 0.000000 0.290000 ( 0.290908)
Sort: 3.410000 0.040000 3.450000 ( 3.547792)
Sort': 0.830000 0.000000 0.830000 ( 0.845180)

Yielding the expected >100x gain over Ruby.

The interface part:

method "qsort", [rope, INT] => rope
method "qsort2", [rope, INT] => rope


I implemented the qsort function both in functional and imperative style, for
the sake of those who prefer something that reads similarly to the Ruby
version. The two variants are equally fast.


let (<<) = concat (* OCaml allows you to define new operators *)
let to_i = int_of_string
let to_s = to_string

let rec qsort' size = function
Empty -> Empty
| rope ->
let pivot = to_i (to_s (sub 0 8 rope)) in
let len = 8 + size in
let less = ref Empty in
let more = ref Empty in
let off = ref len in
while !off < length rope do
let slice = sub !off len rope in
if to_i (to_s (sub !off 8 rope)) < pivot then
less := !less << slice
else
more := !more << slice;
off := !off + len
done;
qsort' size !less << sub 0 len rope << qsort' size !more


let rec qsort size = function
Empty -> Empty
| rope ->
let rec loop r pivot off len max less more =
if off < max then begin
if to_i (to_s (sub off 8 r)) < pivot then
loop r pivot (off+len) len max (less << (sub off len r)) more
else
loop r pivot (off+len) len max less (more << (sub off len r))
end else (less, more) in

let pivot = to_i (to_s (sub 0 8 rope)) in
let len = 8 + size in
let less, more = loop rope pivot len len (length rope) Empty Empty in
qsort size less << sub 0 len rope << qsort size more


--
Mauricio Fernandez - http://eige...

Ari Brown

9/3/2007 4:24:00 PM

0


Ok, So I'm still trying to figure out what stores the characters in a
Rope. A string or an array?

Judging from the fact that you have ArrayRope, I'm thinking it might
be an Array.

On Sep 3, 2007, at 5:42 AM, Gustav Munkby wrote:


> # ArrayRope
> Build: 0.010000 0.010000 0.020000 ( 0.009744)
> Sort: 0.130000 0.000000 0.130000 ( 0.138330)
>
> # ArrayRope (with dup)
> Build: 0.240000 0.160000 0.400000 ( 0.402201)
> Sort: 0.160000 0.000000 0.160000 ( 0.163022)

Help!
Ari
--------------------------------------------|
If you're not living on the edge,
then you're just wasting space.