Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
Scan for Tokens
Raul Raul
11/11/2007 2:08:00 AM
I am looking for the best way to break an input string into individual
tokens (I do not want to use a lexer library); I found some Ruby
programs that do it by "nibbling" at the string, like this (for
simplicity, the tokens are simply printed):
str = "20 * sin(x) + ..."
while (s.length > 0)
if str.sub!(\A\s*(\d+)/) { |m| puts "nr: #{m}" ; '' }
elsif str.sub!(\A\s*(\w+)/) { |m| puts "func: #{m}" ; '' }
This works, but it is very inefficient as the string has to be
continuously modified (a variation is to use str.match and then set str
= post_match, that is
probably even worse).
I was looking for the equivalent of what Perl calls "walking the string"
(if $str =~ /\G ../gcxms), picking up one token at the time at the point
after the previous one was retrieved.
I saw in the Pickaxe the mention of \G with scan; but I could not make
scan work 'one token at the time'; I had to list all the tokens as
argument, and then I had to find out which token had hit, ie:
str.scan(/\G\s* (\d+ | [**]| [+] | [(] | ..)/xm) do |m|
if m[0].match(/A\d+\z/) then puts "number: #{m}"
elsif m[0].match(/A\[**]\z/) then puts "power: #{m}"
..
It worked perfectly (almost to my surprise!); but it seems funny (unRuby
like) to have to repeat the tokens (even if in my real code I used
regexp vars to avoid hardcoding them twice, it still is a repetition).
I looked at 4 Ruby books and I found only platitudes on the subject (or
references to libraries). I would love to hear an elegant way to solve
this,
thanks!
Raul
--
Posted via
http://www.ruby-...
.
2 Answers
Phrogz
11/11/2007 3:47:00 AM
0
On Nov 10, 6:07 pm, Raul Parolari <raulparol...@gmail.com> wrote:
> I am looking for the best way to break an input string into individual
> tokens (I do not want to use a lexer library)
Look at the StringScanner library[1] included with Ruby. It's simple,
and it's fast. It's the basis of my TagTreeScanner library[2], which
is specialized for parsing arbitrary text and converting it into
hierarchically nested markup (e.g. XML).
[1]
http://ruby-doc.org/stdlib/libdoc/strscan/rdoc/...
[2]
http://phrogz.net/RubyLibs/OWLScribble/do...
Raul Raul
11/11/2007 6:27:00 AM
0
Gavin Kistner wrote:
> On Nov 10, 6:07 pm, Raul Parolari <raulparol...@gmail.com> wrote:
>> I am looking for the best way to break an input string into individual
>> tokens (I do not want to use a lexer library)
>
> Look at the StringScanner library[1] included with Ruby. It's simple,
> and it's fast. It's the basis of my TagTreeScanner library[2], which
> is specialized for parsing arbitrary text and converting it into
> hierarchically nested markup (e.g. XML).
>
> [1]
http://ruby-doc.org/stdlib/libdoc/strscan/rdoc/...
> [2]
http://phrogz.net/RubyLibs/OWLScribble/do...
Gavin
I was surprised at first that this basic capability was in a library,
but
StringScanner works beautifully, and it is indeed extremely fast.
I will try your TagTreeScanner at the first chance
Thank you
Raul
--
Posted via
http://www.ruby-...
.
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
Scan for Tokens
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password