Asp Forum - [ANN] Syntax 0.7.0

Jamis Buck

3/24/2005 5:54:00 AM

Syntax is a pure-Ruby framework for doing lexical analysis (and, in
particular, syntax highlighting) of text. It currently sports lexers
for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in
those languages to HTML).

Links:

Download: http://rubyforge.org/frs/?gr...
User Manual: http://docs.jamisbuck.org/r...

This release is much improved in accuracy and robustness (at least, for
the Ruby lexer--the XML and YAML lexers were not changed). The Ruby
lexer now deals better with many ambiguous cases, and even supports
multiple heredocs on a single line. It accurately colorizes cgi.rb and
mkmf.rb from the standard lib, if that means anything at all to you.

The Syntax framework also supports "regions" now (thanks to flgr for
the suggestion) and sports many bug fixes (thanks to Carl Drinkwater
for discovering most of them). Syntax regions just allow one group to
span (and include) multiple groups--like a string that includes
interpolated expressions and escape sequences.

For a pretty example (mkmf.rb fully syntax highlighted) see
http://ruby.jamisbuck.org....

The next release will include robustness fixes for the XML and YAML
lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML,
and RHTML would be nice as well, if I can get to them. Community
submissions will be gladly accepted, as long as you are okay with your
contributed code being distributed under the BSD license.

Enjoy!

- Jamis

22 Answers

Florian Gross

3/24/2005 1:22:00 PM

Jamis Buck wrote:

> Syntax is a pure-Ruby framework for doing lexical analysis (and, in
> particular, syntax highlighting) of text. It currently sports lexers for
> Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in
> those languages to HTML).

And is indeed a wonderful Ruby library. It's just so very cool to have a
library that marks up Ruby properly with <span> classes. It allows you
to do quite a lot to Ruby code.

Thanks a lot, Jamis, for this very nice library!

> For a pretty example (mkmf.rb fully syntax highlighted) see
> http://ruby.jamisbuck.org....

Another one (lots of new CSS) can be seen here:

http://flgr.0x42.net/highli...

I'll be using the Syntax library for dissecting the submissions of the
IORCC and it is a wonderful help.

If you're recognizing your own code in the above screenshot then let me
tell you that you IMHO did a very nice job with your obfuscation.

> The next release will include robustness fixes for the XML and YAML
> lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML,
> and RHTML would be nice as well, if I can get to them. Community
> submissions will be gladly accepted, as long as you are okay with your
> contributed code being distributed under the BSD license.

Having a C lexer will be wonderful as that is exactly something that I'm
currently finding myself needing as well.

I think I'll be able to submit lexers for a few simple languages --
Befunge would be an easy one. But your framework seems to make lexing
more complex language easy as well, so I might as well try that. Guess
we'll see. :)

Sam Roberts

3/24/2005 2:44:00 PM

Quoting jamis@37signals.com, on Thu, Mar 24, 2005 at 02:54:20PM +0900:
> Syntax is a pure-Ruby framework for doing lexical analysis (and, in
> particular, syntax highlighting) of text. It currently sports lexers
> for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts in
> those languages to HTML).

Would this be an appropriate tool for parsing ruby to generate ctags?

To write a tags file I need to know where I am in ruby's terms (in what
class, module), what was found (method, attribute, constant, class,
...), AND I need to generate a regex that will find this place in the
file. For repeated names this can mean knowing what the entire line
looks like, so that I can put leading whitespace into the regex.

Is Syntax something I should be looking at? It seems there are some
similarities.. if you know enough to hilight, maybe you know enough to
generate a ctag?

I'm using rdoc right now, but it is a very large tool, and I would like
something smaller and more malleable, if possible.

Thanks,
Sam

> Links:
>
> Download: http://rubyforge.org/frs/?gr...
> User Manual: http://docs.jamisbuck.org/r...
>
> This release is much improved in accuracy and robustness (at least, for
> the Ruby lexer--the XML and YAML lexers were not changed). The Ruby
> lexer now deals better with many ambiguous cases, and even supports
> multiple heredocs on a single line. It accurately colorizes cgi.rb and
> mkmf.rb from the standard lib, if that means anything at all to you.
>
> The Syntax framework also supports "regions" now (thanks to flgr for
> the suggestion) and sports many bug fixes (thanks to Carl Drinkwater
> for discovering most of them). Syntax regions just allow one group to
> span (and include) multiple groups--like a string that includes
> interpolated expressions and escape sequences.
>
> For a pretty example (mkmf.rb fully syntax highlighted) see
> http://ruby.jamisbuck.org....
>
> The next release will include robustness fixes for the XML and YAML
> lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML,
> and RHTML would be nice as well, if I can get to them. Community
> submissions will be gladly accepted, as long as you are okay with your
> contributed code being distributed under the BSD license.
>
> Enjoy!
>
> - Jamis
>
>

Tobias Luetke

3/24/2005 2:54:00 PM

Thanks you so much for updating this wonderful library of yours.

On Thu, 24 Mar 2005 14:54:20 +0900, Jamis Buck <jamis@37signals.com> wrote:
>
> Links:
>
> Download: http://rubyforge.org/frs/?gr...
> User Manual: http://docs.jamisbuck.org/r...
>

--
Tobi
http://www.sn... - Snowboards that don't suck
http://www.h... - Open source book authoring
http://blog.le... - Technical weblog

Jamis Buck

3/24/2005 4:28:00 PM

On Mar 24, 2005, at 7:44 AM, Sam Roberts wrote:

> Quoting jamis@37signals.com, on Thu, Mar 24, 2005 at 02:54:20PM +0900:
>> Syntax is a pure-Ruby framework for doing lexical analysis (and, in
>> particular, syntax highlighting) of text. It currently sports lexers
>> for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts
>> in
>> those languages to HTML).
>
> Would this be an appropriate tool for parsing ruby to generate ctags?
>

Hmmm, maybe. Not in its current incarnation, though. One thing the
lexer doesn't give you right now is the location of each token in the
file. That would be a good addition, though. I'll see about adding that
to the next version.

> To write a tags file I need to know where I am in ruby's terms (in what
> class, module), what was found (method, attribute, constant, class,
> ...), AND I need to generate a regex that will find this place in the
> file. For repeated names this can mean knowing what the entire line
> looks like, so that I can put leading whitespace into the regex.
>

The lexers that come with Syntax are optimized for syntax highlighting.
You could conceivably write a different lexer module that was optimized
for tag extraction, using the Syntax framework. You'd probably do just
as well to use strscan directly, though.

- Jamis

> Is Syntax something I should be looking at? It seems there are some
> similarities.. if you know enough to hilight, maybe you know enough to
> generate a ctag?
>
> I'm using rdoc right now, but it is a very large tool, and I would like
> something smaller and more malleable, if possible.
>
> Thanks,
> Sam
>
>> Links:
>>
>> Download: http://rubyforge.org/frs/?gr...
>> User Manual: http://docs.jamisbuck.org/r...
>>
>> This release is much improved in accuracy and robustness (at least,
>> for
>> the Ruby lexer--the XML and YAML lexers were not changed). The Ruby
>> lexer now deals better with many ambiguous cases, and even supports
>> multiple heredocs on a single line. It accurately colorizes cgi.rb and
>> mkmf.rb from the standard lib, if that means anything at all to you.
>>
>> The Syntax framework also supports "regions" now (thanks to flgr for
>> the suggestion) and sports many bug fixes (thanks to Carl Drinkwater
>> for discovering most of them). Syntax regions just allow one group to
>> span (and include) multiple groups--like a string that includes
>> interpolated expressions and escape sequences.
>>
>> For a pretty example (mkmf.rb fully syntax highlighted) see
>> http://ruby.jamisbuck.org....
>>
>> The next release will include robustness fixes for the XML and YAML
>> lexers, as well as a lexer for C. Lexers for Perl, Python, Java, HTML,
>> and RHTML would be nice as well, if I can get to them. Community
>> submissions will be gladly accepted, as long as you are okay with your
>> contributed code being distributed under the BSD license.
>>
>> Enjoy!
>>
>> - Jamis
>>
>>
>

gabriele renzi

3/24/2005 4:46:00 PM

Sam Roberts ha scritto:

> I'm using rdoc right now, but it is a very large tool, and I would like
> something smaller and more malleable, if possible.
>

why not ParseTree or ripper ?

Trans

3/24/2005 4:50:00 PM

Speacking of RDOC. Did anyone take up the call for a new maintainer? I
would love to see syntax highlighting in RDoc.

T.

Sam Roberts

3/24/2005 5:22:00 PM

Quoting jamis@37signals.com, on Fri, Mar 25, 2005 at 01:27:37AM +0900:
> On Mar 24, 2005, at 7:44 AM, Sam Roberts wrote:
>
> >Quoting jamis@37signals.com, on Thu, Mar 24, 2005 at 02:54:20PM +0900:
> >>Syntax is a pure-Ruby framework for doing lexical analysis (and, in
> >>particular, syntax highlighting) of text. It currently sports lexers
> >>for Ruby, XML, and YAML, and an HTML convertor (for colorizing texts
> >>in
> >>those languages to HTML).
> >
> >Would this be an appropriate tool for parsing ruby to generate ctags?
> >
>
> Hmmm, maybe. Not in its current incarnation, though. One thing the
> lexer doesn't give you right now is the location of each token in the
> file. That would be a good addition, though. I'll see about adding that
> to the next version.

I don't need location in file, I just need the text of the line:

module Foo
class Bar
class Bar
end

The tag would be
Bar-> regex / class Bar/
Bar-> regex / class Bar/
Foo.Bar -> regex / class Bar/
Foo.Bar.Bar -> regex / class Bar/

I don't need line no.

For this
module Foo
end
class Foo::Bar
end

The tags would be different:
Bar -> /class Foo::Bar/

And for
class
Foo
end

Different again.

Quoting surrender_it@remove-yahoo.it, on Fri, Mar 25, 2005 at 01:49:52AM +0900:
> Sam Roberts ha scritto:
>
> >I'm using rdoc right now, but it is a very large tool, and I would like
> >something smaller and more malleable, if possible.
> >
>
> why not ParseTree or ripper ?

I have no idea what ripper does, but parse tree just gives symbols, it
doesn't have enough information for me to build a regex, as above, does
it?

Making tags is an odd problem. It involves semantic analysis, when you
see class Foo, you need to know if it is in module Bar, or inside class
Joe. But, to generate the tag you need access to the original text so
that you can build a regex, which is sensitive to HOW you wrote the
code, not just what the code means. Most tokenizers goal in life is to
abstract you away from the text, so you just see a stream of syntactic
elements.

Rdoc is useful, because it does the analysis, but it also maintains
original text in a way it can (in some cases) be regenerated to form
regexes.

I think its not a bad place to put it, since tags as another output
format is a reasonable extension of its model.

But... it's really slow (i think its how much data it keeps in memory).
It also doesn't quite give me access to everything I want. I can hack
it, but I'm balking at the chore. Adding an output formatter was easy
and standalone. Hacking its internals... thats another story.

I'm totally open to suggestions. I NEED tags to read code effectively.

I'm faster writing in ruby than in C, but I read C code way, way, way
faster due to the tool support I have (vim+tags) (I debug C faster, too,
because I have a great debugger - gdb.) I'm not happy about this
situation.

Maybe I should suggest this as one of those ruby weekly challenges...
Document the tags format, the goals, and let people choose - rules are
that there are no rules, you can use any tool/library you want, even
non-ruby, and let the best code win. If its non-ruby, well, that would
point out an area where ruby could use some work.

Btw, syntax hilighting with rdoc should be easy, it tokenized the input.

Cheers,
Sam

Florian Gross

3/24/2005 5:31:00 PM

Sam Roberts wrote:

>>why not ParseTree or ripper ?
>
> I have no idea what ripper does, but parse tree just gives symbols, it
> doesn't have enough information for me to build a regex, as above, does
> it?

Ripper basically is Ruby's integrated Ruby parser. It will invoke
callbacks for every kind of construct it encounters.

This code snippet ought to get you started with it:

irb(main):017:0> class MyParser < Ripper
irb(main):018:1> def method_missing(name, *args)
irb(main):019:2> puts "#{name}: #{args.inspect}"
irb(main):020:2> end
irb(main):021:1> end
=> nil
irb(main):022:0> MyParser.new.parse("puts 'Hello World!' if true")
on__scan: ["puts"]
on__IDENTIFIER: ["puts"]
on__scan: [" "]
on__space: [" "]
on__scan: ["'"]
on__new_string: ["'"]
on__scan: ["Hello World!"]
on__add_string: [nil, "Hello World!"]
on__scan: ["'"]
on__string_end: [nil, "'"]
on__scan: [" "]
on__space: [" "]
on__scan: ["if"]
on__KEYWORD: ["if"]
on__argstart: ["Hello World!"]
on__fcall: [:puts, nil]
on__scan: [" "]
on__space: [" "]
on__scan: ["true"]
on__KEYWORD: ["true"]
on__varref: [:true]
on__if_mod: [nil, nil]
=> nil

Florian Gross

3/24/2005 5:32:00 PM

Florian Gross wrote:

> Ripper basically is Ruby's integrated Ruby parser. It will invoke
> callbacks for every kind of construct it encounters.
>
> This code snippet ought to get you started with it:

Oh, and you need to do require 'ripper' before you can use it, of course.

Guillaume Marcais

3/24/2005 6:11:00 PM

On Fri, 2005-03-25 at 02:21 +0900, Sam Roberts wrote:

> I'm totally open to suggestions. I NEED tags to read code effectively.
>
> I'm faster writing in ruby than in C, but I read C code way, way, way
> faster due to the tool support I have (vim+tags) (I debug C faster, too,
> because I have a great debugger - gdb.) I'm not happy about this
> situation.

Maybe I don't understand what you need exactly, but exuberant ctags
supports both ruby and vi:
$ ctags --version
Exuberant Ctags 5.5.4, Copyright (C) 1996-2003 Darren Hiebert
Compiled: May 12 2004, 14:32:50
Addresses: <dhiebert@users.sourceforge.net>, http://ctags.sourc...
Optional compiled features: +wildcards, +regex

$ ctags --list-languages | grep -i ruby
Ruby

It works for me with emacs...

Tell me if I am completely off base.

Cheers,
Guillaume.

comp.lang.ruby

[ANN] Syntax 0.7.0

Jamis Buck

Florian Gross

Sam Roberts

Tobias Luetke

Jamis Buck

gabriele renzi

Trans

Sam Roberts

Florian Gross

Florian Gross

Guillaume Marcais

x Login to ForumsZone