[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Creating a new Syntax tokenizer

Gavin Kistner

6/20/2005 1:11:00 AM

I want to write my own wiki markup language. Pure regexp fails me, as
I need a proper parser to keep track of state.
I thought I'd give Syntax a try, but I'm a little confused as to some
of the specifics.

1) What is a 'region', and how do I use the start_region method? It's
not documented in the API, or the source. (I think this is what I
want for nesting tags.)

2) Do I have to close_group and close_region, or do they
automatically get invoked under certain circumstances? (Does starting
one group close the previous one? Do repeated calls to open the same
group cause them to be aggregated together (is that how accumulating
text in :normal groups works?)

3) How do I keep track of state during successive calls to #step? I
tried an instance variable, but that doesn't seem to exist across calls.

Following is my terrible, broken attempt at the basics of what I'm
after. Am I totally misunderstanding how to use Syntax?


require 'rubygems'
require_gem 'syntax'

class OWLScribble < Syntax::Tokenizer
def step
if heading = scan( /^={1,6}/ )
start_region "heading level #{heading.length}".intern
$heading_end = Regexp.new( heading + "\\s*" )
elsif $heading_end && ( heading = scan( $heading_end ) )
end_region "heading level #{heading.length}".intern
$heading_end = nil
elsif char = scan( /^[\r\n]/ )
start_group :paragraph, char
elsif scan( /\*\*/ )
if $inbold
end_region :bold
$inbold = nil
else
start_region :bold
$inbold = true
end
elsif char = scan( /./ )
start_group :normal, char
else
scan( /[\r\n]/ )
end
end
end

Syntax::SYNTAX[ 'owlscribble' ] = OWLScribble

str = <<END
Intro paragraph

= Heading 1 =
First **paragraph** under the heading.

== Second **Heading** = very yes ==
Another paragraph.
END

tokenizer = Syntax.load( "owlscribble" )
tokenizer.tokenize( str ) do |token|
puts "#{token.group} (#{token.instruction}) #{token}"
end



--
(-, /\ \/ / /\/



2 Answers

Jamis Buck

6/20/2005 2:46:00 AM

0

On Jun 19, 2005, at 7:10 PM, Gavin Kistner wrote:

> I want to write my own wiki markup language. Pure regexp fails me,
> as I need a proper parser to keep track of state.
> I thought I'd give Syntax a try, but I'm a little confused as to
> some of the specifics.
>
> 1) What is a 'region', and how do I use the start_region method?
> It's not documented in the API, or the source. (I think this is
> what I want for nesting tags.)

Regions are groups that can contain other groups nested within them.
Syntax's Ruby tokenizer uses regions to do syntax highlighting of
strings, and interpolated expressions, for example.

start_region is used identically to start_group--you give it the name
of the group you want to start (or continue, if that group is already
open), and an optional string to get things started. (The string
becomes the starter text for the group.)

> 2) Do I have to close_group and close_region, or do they
> automatically get invoked under certain circumstances? (Does
> starting one group close the previous one? Do repeated calls to
> open the same group cause them to be aggregated together (is that
> how accumulating text in :normal groups works?)

close_group is automatically called when you start a new group.
close_region is never automatically called, because regions can be
nested, so unless you have a region that you want to persist to the
end of your document, you need to explicitly call it at some point.

Multiple calls of start_group with the same group name do, indeed,
get concatenated together into a single group.

>
> 3) How do I keep track of state during successive calls to #step? I
> tried an instance variable, but that doesn't seem to exist across
> calls.

Instance variables should work--I use them successfully in the Ruby
tokenizer, for instance. Feel free to contact me off-list and I can
help troubleshoot this if it isn't working for you.

>
> Following is my terrible, broken attempt at the basics of what I'm
> after. Am I totally misunderstanding how to use Syntax?
>

Without actually trying to run it, I'd say you've got the idea. This
is an interesting use of Syntax--given that Syntax was intended for
use as a highlighter, I wouldn't have thought to use it as a more
general purpose parser, but it can definitely be used for that.
Clever! :)

- Jamis



Gavin Kistner

6/23/2005 2:56:00 AM

0

>> 3) How do I keep track of state during successive calls to #step?
>> I tried an instance variable, but that doesn't seem to exist
>> across calls.
>>
>
> Instance variables should work--I use them successfully in the Ruby
> tokenizer, for instance. Feel free to contact me off-list and I can
> help troubleshoot this if it isn't working for you.

Hrm, I'll have to try it again sometime. Perhaps I screwed up when I
tried it.

>> Following is my terrible, broken attempt at the basics of what I'm
>> after. Am I totally misunderstanding how to use Syntax?
>
> Without actually trying to run it, I'd say you've got the idea.
> This is an interesting use of Syntax--given that Syntax was
> intended for use as a highlighter, I wouldn't have thought to use
> it as a more general purpose parser, but it can definitely be used
> for that. Clever! :)

Thanks for the help in your response. As noted in my OWLScratch post,
after thinking about the problem more and more, I decided that I
wanted a more state-based solution than Syntax seemed to surround,
but I want to thank you very much for the library, as the concepts in
it really helped me in my thinking (and introduced me to
StringScanner - what a gem!).