[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Rope Data Structures

John Miller

8/22/2007 7:03:00 PM

Greetings All,

I have been plugging away a a parser for ruby Endo's DNA from this years
International Functional Programming Completion.
(http://www.icfpco...) Most teems who have put up there reviews
have talked about having to switch programming languages to something
faster. Most did this before trying to change algorithms. Many of the
successful teems ended up using the 'rope' from the C++ Standard
Template Library.

please see:

http://en.wikipedia.org/...(computer_science)
http://www.sgi.com/tech/stl/rop...
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/...

I tried writing a rope library in ruby and got an order of magnitude
better performance when compared to a String implementation (6.5 to 48.5
iterations per second -- to successfully solve the problem I estimated I
would need ~2000 iterations per second)

I learned several things from this attempt:
* Avoid creating edge cases they are bad for business and my poor rope
implementation had way too many of them.
* Working with long strings is a slow process (endo's dna is a string
~7.5M chars long)
* For long running programs calling GC.start at points where the number
of living object is at a minimum will save time and Hard Drive actuator
motors.
* The folks who wrote the ruby-prof gem deserve an award for making an
excelent tool.

One thing that got me thinking however was that even my poorly
implemented pure Ruby version of ropes gave me an order of magnitude
extra speed. It made me wonder: How hard would it be to implement ropes
as a compiled binary library for ruby, and how much would using such a
library (even a pure ruby version) speed up rails, which does a lot of
string concatenation?
--
Posted via http://www.ruby-....

8 Answers

James Gray

8/22/2007 7:27:00 PM

0

On Aug 22, 2007, at 2:03 PM, John Miller wrote:

> Most teems who have put up there reviews
> have talked about having to switch programming languages to something
> faster. Most did this before trying to change algorithms.

We used Ruby and didn't switch, but the code was slow.

> Many of the
> successful teems ended up using the 'rope' from the C++ Standard
> Template Library.
>
> please see:
>
> http://en.wikipedia.org/...(computer_science)
> http://www.sgi.com/tech/stl/rop...
> http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/...
> issue12/spe986.pdf
>
> I tried writing a rope library in ruby and got an order of magnitude
> better performance when compared to a String implementation (6.5 to
> 48.5
> iterations per second -- to successfully solve the problem I
> estimated I
> would need ~2000 iterations per second)

I debated for some time about making this task, building a rope for
Ruby, a Ruby Quiz. Can I ask how long it took you? Also, just as a
rough metric, how many lines of code did you write?

James Edward Gray II


John Miller

8/23/2007 2:59:00 AM

0

James Gray wrote:
> On Aug 22, 2007, at 2:03 PM, John Miller wrote:
>
>> Most teems who have put up there reviews
>> have talked about having to switch programming languages to something
>> faster. Most did this before trying to change algorithms.
>
> We used Ruby and didn't switch, but the code was slow.
>
>>
>> I tried writing a rope library in ruby and got an order of magnitude
>> better performance when compared to a String implementation (6.5 to
>> 48.5
>> iterations per second -- to successfully solve the problem I
>> estimated I
>> would need ~2000 iterations per second)
>
> I debated for some time about making this task, building a rope for
> Ruby, a Ruby Quiz. Can I ask how long it took you? Also, just as a
> rough metric, how many lines of code did you write?
>
> James Edward Gray II

It took about a week of causal programming to get a String version
running with no RNA output. To write writing a Rope class was another
three days of sloppy programming. Replacing String with Rope took about
2 hours. I then spent more time then I can recall fixing edge cases in
the Rope class. There are still a few loose ends, and the code has
become so full of edge case checks that it really isn't as fast as it
should be.

Looking only at the first 10,000 iterations, the fast majority of
instructions were copying huge swaths of the original dna. I decided
that a node should be able to represent a 'slice' of another node
without copying when these slices start having to span concatenations
life gets messy. When I added a pop command that could make the left
branch unused things got even more messy. When I wanted a << operator
that would add characters directly to the end of 'short' node things
became atrocious. And when finally I required the Environment DNA from
matchreplace(where my code spent 94.2% of it's time) not create new
copies of the Ropes it was holding when it was perpended to the dna the
code pretty much buckled under its own weight. At the moment there is a
bug in the normalization code that is introducing errors. Without being
able to renormalizes the data structure on th rope gets too deep and
everything grinds to a halt.

Lesson: Keep you data structures simple and elegant.

My rope implementation was initially ~120 (source file) lines. Now that
it is doing all this stuff it is 366. 44 of that is for a kmp_search
algorithm.

I would love to see Ropes as a Ruby quiz.

John Miller
--
Posted via http://www.ruby-....

benjohn

8/23/2007 8:05:00 AM

0

Just a musing on this, but if we want a fast and scalable rope structure
in Ruby (and speed and scalability are the reasons for using a rope, I
believe?), wouldn't it be better (though not quite so much fun) to
simply wrap a c / c++ rope library, rather than implement it from
scratch in Ruby?

Perhaps it'd even be possible to get Ruby's string to switch internal
implementation using some huristics about its size.

Cheers,
Benjohn


Peña, Botp

8/23/2007 8:25:00 AM

0

On Behalf Of John Miller:
# I tried writing a rope library in ruby and got an order of magnitude
# better performance when compared to a String implementation (6.5 to 48.5

wow, that's quite a helpful boost. specially since a large percentage of the web and database time is spent on string manipulation. post your work as a gem, pls.

kind regards -botp

Jimmy Kofler

8/23/2007 9:42:00 AM

0

Mauricio Fernandez, the author of rocaml, is working on a ropes
implementation called oropes, cf. http://eigenclass.org/rep...
and
http://eigenclass.org/rep...head/doc/Rope.html . For a discussion
see "Ropes and rope-like functional extensible vectors with O(1)
prepend/append",
http://groups.google.com/group/fa.caml/browse_thread/thread/4e70be...

James Gray

8/23/2007 12:05:00 PM

0

On Aug 22, 2007, at 9:59 PM, John Miller wrote:

> James Gray wrote:
>> On Aug 22, 2007, at 2:03 PM, John Miller wrote:
>>
>>> Most teems who have put up there reviews
>>> have talked about having to switch programming languages to
>>> something
>>> faster. Most did this before trying to change algorithms.
>>
>> We used Ruby and didn't switch, but the code was slow.
>>
>>>
>>> I tried writing a rope library in ruby and got an order of magnitude
>>> better performance when compared to a String implementation (6.5 to
>>> 48.5
>>> iterations per second -- to successfully solve the problem I
>>> estimated I
>>> would need ~2000 iterations per second)
>>
>> I debated for some time about making this task, building a rope for
>> Ruby, a Ruby Quiz. Can I ask how long it took you? Also, just as a
>> rough metric, how many lines of code did you write?
>>
>> James Edward Gray II
>
> It took about a week of causal programming to get a String version
> running with no RNA output.

Thanks for all of the information.

> I would love to see Ropes as a Ruby quiz.

Contact me off-list if you are interested in helping me put it together.

James Edward Gray II

John Miller

8/23/2007 7:59:00 PM

0

>Just a musing on this, but if we want a fast and scalable rope structure
>in Ruby (and speed and scalability are the reasons for using a rope, I
>believe?), wouldn't it be better (though not quite so much fun) to
>simply wrap a c / c++ rope library, rather than implement it from
>scratch in Ruby?
>
>Perhaps it'd even be possible to get Ruby's string to switch internal
>implementation using some huristics about its size.
>
>Cheers,
> Benjohn

I agree that for something like this a binary distribution is best, but
they still cause problems in cross platform environments. (eg. JRuby) I
think the best solutions are 1.) Have Multiple Gem distros (java,
mswin32, source) and include a pure ruby version. Or 2.) integrate this
into the interpreter core as another built in class.


Peña, Botp wrote:
> On Behalf Of John Miller:
> # I tried writing a rope library in ruby and got an order of magnitude
> # better performance when compared to a String implementation (6.5 to
> 48.5
>
> wow, that's quite a helpful boost. specially since a large percentage of
> the web and database time is spent on string manipulation. post your
> work as a gem, pls.
>
> kind regards -botp

The ICFP task was written in such a way as to make maximum use of Ropes.
I think that processing eRB template would also see pretty good gains.
However, one must be careful to use Ropes only when string concatenation
and slicing (Ropes do these in O(1) ) are much more common then
accessing the string content (O(log(n)). An application where a string
is build once and the accessed repeatedly will have poorer performance
if it is converted to a rope. On the other hand a string that is
repeatedly manipulated before finally being printed a single time will
see performance gains proportional to the average length of the
String/Rope.

Databases do spend lots of time with strings, but there are many more
accesses to strings then changes. Beyond that most of the strings that
databases deal with are "short."

I will see what I can do about getting out some sample code.

John Miller

--
Posted via http://www.ruby-....

Daniel Berger

10/20/2007 1:04:00 PM

0

John Miller wrote:
> Greetings All,
>
> I have been plugging away a a parser for ruby Endo's DNA from this years
> International Functional Programming Completion.
> (http://www.icfpco...) Most teems who have put up there reviews
> have talked about having to switch programming languages to something
> faster. Most did this before trying to change algorithms. Many of the
> successful teems ended up using the 'rope' from the C++ Standard
> Template Library.
>
> please see:
>
> http://en.wikipedia.org/...(computer_science)
> http://www.sgi.com/tech/stl/rop...
> http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/...
>
> I tried writing a rope library in ruby and got an order of magnitude
> better performance when compared to a String implementation (6.5 to 48.5
> iterations per second -- to successfully solve the problem I estimated I
> would need ~2000 iterations per second)

Did you ever publish this?

Regards,

Dan