Asp Forum - Writing parsers?

Paatsch, Bernd

2/14/2006

Hello,

I got this great task assigned to write a parser and looking at the files to
parse is not very trivial. Does anybody know where to find a website that
would explain steps and pitfalls to avoid writing a parser?
Any suggestion/help in is appreciated.

Thanks,
Bernd

6 Answers

David Vallner

2/14/2006 12:47:00 AM

Dna Utorok 14 Február 2006 00:59 Paatsch, Bernd napísal:
> Hello,
>
> I got this great task assigned to write a parser and looking at the files
> to parse is not very trivial. Does anybody know where to find a website
> that would explain steps and pitfalls to avoid writing a parser?
> Any suggestion/help in is appreciated.
>
> Thanks,
> Bernd

http://epaperpress.com/l... seems to be a useful resource, provided you
already know some theory behind formal grammars and such. The two tools and
their derivatives are pretty much the open source standard for writing
parsers. I believe there are Ruby bindings / variants of both.

ANTLR is also somewhat used, but you're probably looking at Java there.

David Vallner

Doug H

2/14/2006 1:03:00 AM

http://www.google.com/search?hl=en&am...

:) No seriously, check out ANTLR. Unless you are supposed to write the
parser from scratch.
If you want to do it in ruby, there are options like:
http://split-s.blogspot.com/2005/12/antlr-for...
http://www.zenspider.com/ZSS/Produ...

Timothy Goddard

2/14/2006 11:31:00 AM

I just whipped this up in a bit of free time. It may be a decent
starting point for a pure ruby parser. Note that there is no lookahead
ability.

class ParseError < StandardError; end

class Parser

@@reductions = {}
@@reduction_procs = {}
@@tokens = {}
@@token_values = {}

# Parse either a string or an IO object (read all at once) using the
rules defined for this parser.
def parse(input)
stack = []
value_stack = []
text = input.is_a?(IO) ? input.read : input.dup
loop do
token, value = retrieve_token(text)
stack << token
value_stack << value
reduce_stack(stack, value_stack)
if text.length == 0
if stack.length == 1
return stack[0], value_stack[0]
else
raise ParseError, 'Stack failed to reduce'
end
end
end
end
protected

# Retrieve a single token from the input text and return an array of
it and its value.
def retrieve_token(text)
@@tokens.each do |regexp, token|
if md = text.match(regexp)
text.gsub!(regexp, '')
return [token, @@token_values[token] ?
@@token_values[token].call(md.to_s) : nil]
end
end
raise ParseError, "Invalid token in input near #{text}"
end

# Compare the stack to reduction rules to reduce any matches found
def reduce_stack(stack, value_stack)
loop do
matched = false
@@reductions.each do |tokens, result|
if tokens == stack[stack.length - tokens.length, tokens.length]
start_pos = stack.length - tokens.length
stack[start_pos, tokens.length] = result
value_stack[start_pos, tokens.length] =
@@reduction_procs[tokens] ?
@@reduction_procs[tokens].call(value_stack[start_pos, tokens.length]) :
nil
matched = true
break
end
end
return unless matched
end
end

def self.token(regexp, token, &block)
@@tokens[Regexp.new('\A' + regexp.to_s)] = token
@@token_values[token] = block
end

def self.rule(*tokens, &block)
final = tokens.pop
tokens += final.keys
result = final.values.first
@@reductions[tokens] = result
@@reduction_procs[tokens] = block
end
end

class TestParser < Parser
token /foo/i, :foo do |s|
s.upcase
end
token /bar/i, :bar do |s|
s.downcase
end
token /mega/i, :mega do |s|
3
end
rule :foo, :bar => :foobar do |foo, bar|
foo + bar
end
rule :mega, :foobar => :megafoobar do |mega, foobar|
foobar * mega
end
end

Robert Klemme

2/14/2006 12:36:00 PM

Paatsch, Bernd wrote:
> Hello,
>
> I got this great task assigned to write a parser and looking at the
> files to parse is not very trivial. Does anybody know where to find a
> website that would explain steps and pitfalls to avoid writing a
> parser?
> Any suggestion/help in is appreciated.

http://raa.ruby-lang.org/pro...
http://raa.ruby-lang.org/project/...

robert

ptkwt

2/15/2006 7:29:00 AM

In article <1139916679.044875.75620@g47g2000cwa.googlegroups.com>,
Timothy Goddard <interfecus@gmail.com> wrote:
>I just whipped this up in a bit of free time. It may be a decent
>starting point for a pure ruby parser. Note that there is no lookahead
>ability.
>
>class ParseError < StandardError; end
>
>class Parser
>
> @@reductions = {}
> @@reduction_procs = {}
> @@tokens = {}
> @@token_values = {}
>
> # Parse either a string or an IO object (read all at once) using the
>rules defined for this parser.
> def parse(input)
> stack = []
> value_stack = []
> text = input.is_a?(IO) ? input.read : input.dup
> loop do
> token, value = retrieve_token(text)
> stack << token
> value_stack << value
> reduce_stack(stack, value_stack)
> if text.length == 0
> if stack.length == 1
> return stack[0], value_stack[0]
> else
> raise ParseError, 'Stack failed to reduce'
> end
> end
> end
> end
> protected
>
> # Retrieve a single token from the input text and return an array of
>it and its value.
> def retrieve_token(text)
> @@tokens.each do |regexp, token|
> if md = text.match(regexp)
> text.gsub!(regexp, '')
> return [token, @@token_values[token] ?
>@@token_values[token].call(md.to_s) : nil]
> end
> end
> raise ParseError, "Invalid token in input near #{text}"
> end
>
> # Compare the stack to reduction rules to reduce any matches found
> def reduce_stack(stack, value_stack)
> loop do
> matched = false
> @@reductions.each do |tokens, result|
> if tokens == stack[stack.length - tokens.length, tokens.length]
> start_pos = stack.length - tokens.length
> stack[start_pos, tokens.length] = result
> value_stack[start_pos, tokens.length] =
>@@reduction_procs[tokens] ?
>@@reduction_procs[tokens].call(value_stack[start_pos, tokens.length]) :
>nil
> matched = true
> break
> end
> end
> return unless matched
> end
> end
>
> def self.token(regexp, token, &block)
> @@tokens[Regexp.new('\A' + regexp.to_s)] = token
> @@token_values[token] = block
> end
>
> def self.rule(*tokens, &block)
> final = tokens.pop
> tokens += final.keys
> result = final.values.first
> @@reductions[tokens] = result
> @@reduction_procs[tokens] = block
> end
>end
>
>class TestParser < Parser
> token /foo/i, :foo do |s|
> s.upcase
> end
> token /bar/i, :bar do |s|
> s.downcase
> end
> token /mega/i, :mega do |s|
> 3
> end
> rule :foo, :bar => :foobar do |foo, bar|
> foo + bar
> end
> rule :mega, :foobar => :megafoobar do |mega, foobar|
> foobar * mega
> end
>end
>

This is a bit like Grammar:
http://grammar.rubyforg...

Phil

Timothy Goddard

2/15/2006 10:16:00 AM

Grammar looks much more similar to Spirit, a C++ parser which looks
really simple to use. It uses a very simple domain-specific language
for writing grammars in C++ code. It's part of the boost libraries. It
would be my first choice for a medium-speed parser that could be used
quite easily from Ruby with just a few joining bits of C. Parsers in
the style of YACC or Bison are much faster again, but the added
complexity of defiing grammar probably makes using it a premature
optimisation for most tasks.

comp.lang.ruby

Writing parsers?

Paatsch, Bernd

David Vallner

Doug H

Timothy Goddard

Robert Klemme

ptkwt

Timothy Goddard

x Login to ForumsZone