Asp Forum - Efficient storage of a temporary string

Randy Kramer

3/2/2005 8:26:00 PM

Background: In order to do the parsing I've talked about in another thread, in
many circumstances I need to know the number of spaces before and after the
current token. I'm trying to think about efficient ways to do that--one
might be to do a preprocess pass through the text to figure out how many
spaces separate various tokens then store the tokens and spaces between them
in a temporary in memory data structure, or I'll need a way to backtrack from
the found position of some token to find how many spaces separate it from the
previous token.

In another thread I asked about streams. In this thread I want to ask about
an efficient way to store the intermediate result if I do a preprocessing
pass.

What I envision as a result of the preprocessing pass is a new representation
of the file where all spaces or groups of spaces are replaced by a list of
"tokens" and the numbers of spaces between those tokens or between a token
and then last/next newline. For example, with the TWiki marked up text:

This is a two level bulleted list:
* Level 1
* Level 2

The result I'd see is something like this:

bof,0,"This is a two level bulleted list:",0,\n,3,*,1,"Level 1",0,
\n,6,*,1,"Level 2",eof

Aside: I don't necessarily have to break everything down into tokens of a
single word (I didn't in the above), but it might end up being easier.

What makes the most sense as temporary storage of that result? My guess is an
array, which will expand thruout the prescan process (unless I preallocate an
array of an appropriate size--can I do that in Ruby), and then be destroyed
after the main processing pass. (I'll probably do the main processing pass
by essentially incrementing my way through that array.)

Is there a better approach?

(Aside: At some point I may rewrite the method to do this preprocessing pass
in C.)

Randy Kramer

1 Answer

Robert Klemme

3/2/2005 10:39:00 PM

"Randy Kramer" <rhkramer@gmail.com> schrieb im Newsbeitrag
news:200503021525.33850.rhkramer@gmail.com...
> Background: In order to do the parsing I've talked about in another
> thread, in
> many circumstances I need to know the number of spaces before and after
> the
> current token. I'm trying to think about efficient ways to do that--one
> might be to do a preprocess pass through the text to figure out how many
> spaces separate various tokens then store the tokens and spaces between
> them
> in a temporary in memory data structure, or I'll need a way to backtrack
> from
> the found position of some token to find how many spaces separate it from
> the
> previous token.
>
> In another thread I asked about streams. In this thread I want to ask
> about
> an efficient way to store the intermediate result if I do a preprocessing
> pass.
>
> What I envision as a result of the preprocessing pass is a new
> representation
> of the file where all spaces or groups of spaces are replaced by a list of
> "tokens" and the numbers of spaces between those tokens or between a token
> and then last/next newline. For example, with the TWiki marked up text:
>
> This is a two level bulleted list:
> * Level 1
> * Level 2
>
> The result I'd see is something like this:
>
> bof,0,"This is a two level bulleted list:",0,\n,3,*,1,"Level 1",0,
> \n,6,*,1,"Level 2",eof
>
> Aside: I don't necessarily have to break everything down into tokens of a
> single word (I didn't in the above), but it might end up being easier.
>
> What makes the most sense as temporary storage of that result? My guess
> is an
> array, which will expand thruout the prescan process (unless I preallocate
> an
> array of an appropriate size--can I do that in Ruby),

Yes, you can

>> Array.new 10
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]

But I'd do that only if the array allocation / reallocation proves as
performance bottleneck.

> and then be destroyed
> after the main processing pass. (I'll probably do the main processing
> pass
> by essentially incrementing my way through that array.)
>
> Is there a better approach?
>
> (Aside: At some point I may rewrite the method to do this preprocessing
> pass
> in C.)

Does this help?

>> s=<<EOF
This is a two level bulleted list:
* Level 1
* Level 2
EOF
>> a=[]; s.scan %r{"[^"]*"|\S+|\n|\s+}xo do |m| a << (/\A\s+\z/ =~ m ?
>> m.length : m ) end
=> "This is a two level bulleted list:\n * Level 1\n * Level 2\n"
>> a
=> ["This", 1, "is", 1, "a", 1, "two", 1, "level", 1, "bulleted", 1,
"list:", 1, 3, "*", 1, "Level", 1, "1", 1, 6, "*", 1, "Level",
1, "2", 1]

The quoting part of the regexp can be improved to accept escaped quotes
inside a string as well as single quotes but I guess, you get the picture.

Also, you can do any type of conversion on the matched string in the block
before you insert the match into the array. If you use grouping in the
regexp, then you probably can use that for discrimination of the action to
be taken.

Kind regards

robert

comp.lang.ruby

Efficient storage of a temporary string

Randy Kramer

Robert Klemme

x Login to ForumsZone