[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Scan for Tokens

Raul Raul

11/11/2007 2:08:00 AM

I am looking for the best way to break an input string into individual
tokens (I do not want to use a lexer library); I found some Ruby
programs that do it by "nibbling" at the string, like this (for
simplicity, the tokens are simply printed):
str = "20 * sin(x) + ..."

while (s.length > 0)
if str.sub!(\A\s*(\d+)/) { |m| puts "nr: #{m}" ; '' }
elsif str.sub!(\A\s*(\w+)/) { |m| puts "func: #{m}" ; '' }

This works, but it is very inefficient as the string has to be
continuously modified (a variation is to use str.match and then set str
= post_match, that is
probably even worse).
I was looking for the equivalent of what Perl calls "walking the string"
(if $str =~ /\G ../gcxms), picking up one token at the time at the point
after the previous one was retrieved.

I saw in the Pickaxe the mention of \G with scan; but I could not make
scan work 'one token at the time'; I had to list all the tokens as
argument, and then I had to find out which token had hit, ie:

str.scan(/\G\s* (\d+ | [**]| [+] | [(] | ..)/xm) do |m|
if m[0].match(/A\d+\z/) then puts "number: #{m}"
elsif m[0].match(/A\[**]\z/) then puts "power: #{m}"
..

It worked perfectly (almost to my surprise!); but it seems funny (unRuby
like) to have to repeat the tokens (even if in my real code I used
regexp vars to avoid hardcoding them twice, it still is a repetition).

I looked at 4 Ruby books and I found only platitudes on the subject (or
references to libraries). I would love to hear an elegant way to solve
this,

thanks!

Raul
--
Posted via http://www.ruby-....

2 Answers

Phrogz

11/11/2007 3:47:00 AM

0

On Nov 10, 6:07 pm, Raul Parolari <raulparol...@gmail.com> wrote:
> I am looking for the best way to break an input string into individual
> tokens (I do not want to use a lexer library)

Look at the StringScanner library[1] included with Ruby. It's simple,
and it's fast. It's the basis of my TagTreeScanner library[2], which
is specialized for parsing arbitrary text and converting it into
hierarchically nested markup (e.g. XML).

[1] http://ruby-doc.org/stdlib/libdoc/strscan/rdoc/...
[2] http://phrogz.net/RubyLibs/OWLScribble/do...

Raul Raul

11/11/2007 6:27:00 AM

0

Gavin Kistner wrote:
> On Nov 10, 6:07 pm, Raul Parolari <raulparol...@gmail.com> wrote:
>> I am looking for the best way to break an input string into individual
>> tokens (I do not want to use a lexer library)
>
> Look at the StringScanner library[1] included with Ruby. It's simple,
> and it's fast. It's the basis of my TagTreeScanner library[2], which
> is specialized for parsing arbitrary text and converting it into
> hierarchically nested markup (e.g. XML).
>
> [1] http://ruby-doc.org/stdlib/libdoc/strscan/rdoc/...
> [2] http://phrogz.net/RubyLibs/OWLScribble/do...

Gavin

I was surprised at first that this basic capability was in a library,
but
StringScanner works beautifully, and it is indeed extremely fast.

I will try your TagTreeScanner at the first chance

Thank you

Raul
--
Posted via http://www.ruby-....