Asp Forum - Avoiding memcpy() during String creation in a C extension

Wincent Colaiuta

2/6/2008 5:30:00 PM

I'm working on a fast wikitext-to-HTML translator written in C and am
trying to optimize it.

Profiling shows that a fair portion of the time is being spent
instantiating temporary strings and much of that time is spent in
memcpy.

In much of the code I could avoid many String instantiations and just
prepare everything in a buffer at the raw byte level, not touching any
Ruby objects, and do a rb_str_new() at the end.

The problem with this approach is that Ruby will alloc space for the
string and then memcpy the buffer, which is wasteful seeing as I've
already done the alloc and I just want to relinquish control of that
block of memory and hand it over to Ruby.

Basically what I would like is a String instantiation method which
could take a pointer to a buffer and basically take ownership of that
buffer instead of calling memcpy() on it.

I've looked in string.c to see if such a function exists and it
appears that it does not. I know I could write my own such function
but that is starting to meddle with the internals a little bit too
much for my liking, and I'd need a separate version for Ruby 1.8 and
another for 1.9. It would be very brittle and could break if changes
are made to Ruby in the future.

Any ideas? Or should I just resign myself to that redundant memcpy()?
We're not talking about a lot of memory, but it's something that can
run thousands of times per second and is currently processing wikitext
at about 6 megabytes per second and I'd like to make it even faster.

Cheers,
Wincent

5 Answers

Tim Hunter

2/6/2008 5:46:00 PM

Wincent Colaiuta wrote:
>
> Basically what I would like is a String instantiation method which
> could take a pointer to a buffer and basically take ownership of that
> buffer instead of calling memcpy() on it.
>
> I've looked in string.c to see if such a function exists and it
> appears that it does not.

Instead of putting your data into a buffer and then asking Ruby to make
a String out of it, could you use rb_str_buf_new() to create a String
with a specific capacity and then put your data into the String's
buffer?

--
Posted via http://www.ruby-....

Wincent Colaiuta

2/6/2008 6:10:00 PM

On 6 feb, 18:46, Tim Hunter <rmag...@gmail.com> wrote:
> Wincent Colaiuta wrote:
>
> > Basically what I would like is a String instantiation method which
> > could take a pointer to a buffer and basically take ownership of that
> > buffer instead of calling memcpy() on it.
>
> > I've looked in string.c to see if such a function exists and it
> > appears that it does not.
>
> Instead of putting your data into a buffer and then asking Ruby to make
> a String out of it, could you use rb_str_buf_new() to create a String
> with a specific capacity and then put your data into the String's
> buffer?

Thanks, Tim.

That would indeed work I think. Didn't know about that function but it
will come in very handy I think. And if the buffer needs to grow as I
append output to it then rb_str_cat should be relatively efficient (I
certainly noticed a huge speed-up when I started using rb_str_cat with
raw buffers rather than creating new Ruby Strings and using
rb_str_append).

Cheers,
Wincent

Phlip

2/6/2008 7:43:00 PM

Wincent Colaiuta wrote:

> I'm working on a fast wikitext-to-HTML translator written in C and am
> trying to optimize it.
>
> Profiling shows that a fair portion of the time is being spent
> instantiating temporary strings and much of that time is spent in
> memcpy.

Shouldn't you just focus on streaming? I always figured that getting faster than
fgetc() and fputc() would requiring reinventing all the memory-mapped files and
DMA that back those up...

--
Phlip

Wincent Colaiuta

2/8/2008 9:58:00 AM

On 6 feb, 20:43, Phlip <phlip2...@gmail.com> wrote:
> Wincent Colaiuta wrote:
> > I'm working on a fast wikitext-to-HTML translator written in C and am
> > trying to optimize it.
>
> > Profiling shows that a fair portion of the time is being spent
> > instantiating temporary strings and much of that time is spent in
> > memcpy.
>
> Shouldn't you just focus on streaming? I always figured that getting faster than
> fgetc() and fputc() would requiring reinventing all the memory-mapped files and
> DMA that back those up...

I'm not sure what you mean. Can you clarify a bit? The translator
consists of an extremely faster Ragel tokenizer/lexer/scanner that
emits tokens one at a time, and a parser that decides what to do with
them. The tokens themselves don't contain copies of the corresponding
substrings, but pointers into the input string. The parser is not a
recursive descent parser but an on-the-fly translator that decides
what to emit as it sees each token (thus avoiding function call
overhead; we don't need the same level of structured parsing that a
recursive descent parser would give us because we're only transforming
wikitext into HTML, not parsing a programming language; wikitext can
be imperfect and have syntax errors whereas a strictly defined
programming language must not).

But when it comes time to emit the output you generally have to
instantiate some strings. As an example, consider parsing the
wikitext, ''foo'', which should be translated into foo. At
the Ruby level you want a simple interface, where you pass in a
standard String object and get back a standard String object:

parser.parse "''foo''"
=> "foo"

So under the covers in the C extension this is what happens:

0. Initialize empty output buffer, currently just an empty Ruby
string
1. Provide tokenizer with a pointer to the start of the input string
(no copies are made)
2. Ask tokenizer for the next token
3. Get an "EM" token (no copy of token text made, just a pointer
into the input string
4. Parser appends "" to output buffer using rb_str_cat (avoiding
instantiating of Ruby String)
5. Ask tokenizer for next token
6. Get a "PRINTABLE" token (no copy of token text made, just a
pointer into the input string)
7. Parser appends "foo" to output buffer using rb_str_cat (again
avoiding instantiating a Ruby String)
8. Ask tokenizer for next token
7. Get an "EM" token
9. Parser appends " to output buffer using rb_str_cat
10. Return finished String to caller

So in a general sense it is a "streaming" transformer (meaning that it
transforms the input on the fly without doing any lookahead except in
some special circumstances where it has to ask the lexer for another
token in order to decide what to emit), but I gather from your comment
that you're referring to something else.

So in looking at the code there are now very few String/Object
instantiations that I can avoid because I am already doing as much as
I can at the raw byte level (with rb_str_cat and memcpy itself).
Nevertheless it looks like those few sites are now the lowest-hanging-
fruit reported in profiling and in order to make it faster that's what
I have to attack. So I sent my original email in this thread because
there are still _some_ temporary instantiations that I could avoid on
occasions where the parser calls out to other functions (which are
actually mostly inlined), and being able to pass in a buffer without
having Ruby memcpy it would be a great help.

Cheers,
Wincent

Phlip

2/8/2008 1:36:00 PM

Wincent Colaiuta wrote:

>> Shouldn't you just focus on streaming? I always figured that getting
>> faster than fgetc() and fputc() would requiring reinventing all the
>> memory-mapped files and DMA that back those up...
>
> I'm not sure what you mean.

http://en.wikipedia.org/wiki/One-pas...

Streaming is the classic paradigm to read a file, process, and write
another file. One character goes into your program at a time, it changes
your internal state, and your program writes one character at a time.

> The tokens themselves don't contain copies of the corresponding
> substrings, but pointers into the input string.

K. Can you start writing this in C++, if you actually need the speed?

> But when it comes time to emit the output you generally have to
> instantiate some strings. As an example, consider parsing the wikitext,
> ''foo'', which should be translated into foo. At the Ruby level
> you want a simple interface, where you pass in a standard String object
> and get back a standard String object:
>
> parser.parse "''foo''"
> => "foo"

My vision is very simple. The '' triggers the program to write , and
changes the state to "in an em". Then the program writes foo, then it
writes and changes the state to "outside an em". At no time we
constructed a string. We collected the "em" from a table of frozen
strings, and we emitted the foo by moving that character pointer from the
preexisting f to after the o.

> So under the covers in the C extension this is what happens:
>
> 0. Initialize empty output buffer, currently just an empty Ruby
> string
> 1. Provide tokenizer with a pointer to the start of the input string
> (no copies are made)
> 2. Ask tokenizer for the next token
> 3. Get an "EM" token (no copy of token text made, just a pointer
> into the input string
> 4. Parser appends "" to output buffer using rb_str_cat (avoiding
> instantiating of Ruby String)

Right there the system extended the catted string. Why not simply drop
 out a file handle? Then hardware will take care of any concatenation
and rebuffering issues...

--
Phlip

comp.lang.ruby

Avoiding memcpy() during String creation in a C extension

Wincent Colaiuta

Tim Hunter

Wincent Colaiuta

Phlip

Wincent Colaiuta

Phlip

x Login to ForumsZone