daz
10/5/2003 12:12:00 AM
"Hal Fulton" <hal9000@hypermetrics.com> wrote:
> I've been looking at lex.c and parse.y and parse.c, ...
Pending a correction, lex.c is an unused remnant.
parse.c is ignorable (generated by Yacc from parse.y).
The real ruby lexer is in parse.y (function yylex).
>
> How might one simply break Ruby code into tokens?
>
>
> Hal
>
While writing IRB, Keiju ISHITSUKA seems to have taken
the trouble to expose his lexer to other callers.
Thank you.
ruby-lex is a ruby emulation of the interpreter's lexer.
(May have slight differences.)
As part of IRB, it's standard distribution.
I haven't seen examples -- this offering tokenizes itself
but you can change to a script-file target.
#------------------------------------
require 'irb\ruby-lex'
include RubyToken
#File.open('testfile.rb') do |infile| # see: lex.set_input
tree = []
ikeys = [:name, :op, :value, :node]
lex = RubyLex.new
DATA.rewind
lex.set_input(DATA) # (DATA) or (infile)
line = lex.get_readed # read (past tense;)
while tk = lex.token
tkc = tk.class.to_s.sub(/\ARubyToken::/, '')
tkih = { :tk => tkc,
:line => tk.line_no,
:seek => tk.seek,
:char_no => tk.char_no }
# some tokens have extra attributes.
ikeys.each do |tkk|
tkih[tkk.to_sym] = tk.respond_to?(tkk) && tk.send(tkk)
end
tree << tkih
if tkc === 'TkNL'
# puts line unless line == /\A\s*\Z/ # line sep
line = lex.get_readed # next line
# Note: read line left here otherwise
# position of NL is mis-reported [BUG?].
end
end
tree.each do |tkh|
printf("line %-3d @%3d: %-12s", tkh[:line], tkh[:char_no], tkh[:tk])
printf(" [%s]", tkh[:name]) if tkh[:name]
tkh.each do |k, v|
next unless (ikeys - [:name]).include?(k)
printf(" %s(%s)", k, v) if v
end
puts
puts if tkh[:tk] == 'TkNL'
end
#end # File.open
__END__
#------------------------------------
There may be other methods of interest in:
lib\ruby\1.8\irb\slex.rb
lib\ruby\1.8\irb\ruby-lex.rb
lib\ruby\1.8\irb\ruby-token.rb
daz