Asp Forum - Announcing Reg 0.4.0

vikkous

4/24/2005 2:32:00 AM

I would like to announce the first version, 0.4.0, of Reg, the Ruby
Extended Grammar. Reg is a library for pattern matching in ruby data
structures. Reg provides Regexp-like match and match-and-replace for
all data structures (particularly Arrays, Objects, and Hashes), not
just Strings.

The Reg RubyForge project: http://rubyforge.org/pro...

The Reg Tarball:
http://rubyforge.org/frs/download.php/4199/reg-0.4...

Reg is best thought of in analogy to regular expressions; Regexps are
special data structures for matching Strings; Regs are special data
structures for matching ANY type of ruby data (Strings included, using
Regexps).

This table compares syntax of reg and regexp for various constructs.
Keep
in mind that all Regs are ordinary ruby expressions. The special syntax

is acheived by overriding ruby operators.
These abbreviations are used:
re,re1,re2 represent arbitrary regexp subexpressions,
r,r1,r2 represent arbitrary reg subexpressions
s,t represent any single character (perhaps appropriately escaped, if
the char is magical)

reg regexp #description

+[r1,r2,r3] /re1re2re3/ #sequence
-[r1,r2] (re1re2) #subsequence
r.lit \re #escaping a magical
regproc{r} #{re} #dynamic inclusion
r1|r2 or :OR (re1|re2) or [st] #alternation
~r [^s] #negation (for scalar r and s)
r+0 re* #zero or more matches
r+1 re+ #one or more matches
r-1 re? #zero or one matches
r*n re{n} #exactly n matches
r*(n..m) re{n,m} #at least n, at most m matches
r-n re{n,} #at least n matches
r+m re{,m} #at most m matches
OB . #a single item
OBS .* #zero or more items
BR[1,2] \1,\2 #backreference ***
r>>x or sub sub,gsub #search and replace ***

here are features of reg that don't have an equivalent in regexp
r.la #lookahead ***
~-[] #subsequence negation w/lookahead ***
& or :AND #all alternatives match
^ or :XOR #exactly one of alternatives matches
+{r1=>r2} #hash matcher
-{name=>r} #object matcher
obj.reg #turn any ruby object into a reg that matches if
obj.=== succeeds
/re/.sym #a symbol regex
proceq(klass){rcode} #a proc{} that responds to === by invoking the
proc's call
OBS as un-anchor #opposite of ^ and $ when placed at edges of a
reg array (kinda cheesy)
name=r #named subexpressions

recursive matches via regvariables&regconstants ***

*** = not implemented yet.

Reg is kind of hard to wrap your mind around, so here are some
examples:

Matches array containing exactly 2 elements; 1st is another array, 2nd
is integer:
+[Array,Integer]

Like above, but 1st is array of arrays of symbol
+[+[+[Symbol.reg+0]+0],Integer]

Matches array of at least 3 consecutive symbols and nothing else:
+[Symbol.reg+3]

Matches array with at least 3 symbols in it somewhere:
+[OBS, Symbol.reg+3, OBS]

Matches array of at most 6 strings starting with 'g'
+[/^g/-6] #no .reg necessary for regexp

Matches array of between 5 and 9 hashes containing a key :k pointing to
something non-nil:
+[ +{:k=>~nil.reg}*(5..9) ]

Matches an object with Integer instance variable @k and property (ie
method) foobar that returns a string with 'baz' somewhere in it:
-{:@k=>Integer, :foobar=>/baz/}

Matches array of 6 hashes with 6 as a value of at least one key,
followed by 18 objects with an attribute @s which is a String:
+[ +{OB=>6}*6, -{:@s=>String}*18 ]

Status:
Some highly nested vector reg constructions still don't work quite
right. (For examples, search on eat_unworking in regtest.rb.) A number
of features are unimplemented at this point, most notably
backreferences and substitutions.

27 Answers

Jon Raphaelson

4/24/2005 3:52:00 AM

Ok, I'm going to go out on a limb here and say HOLY GOD THIS IS AWESOME.

Sorry for the shouting.

vikkous wrote:
> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar. Reg is a library for pattern matching in ruby data
> structures. Reg provides Regexp-like match and match-and-replace for
> all data structures (particularly Arrays, Objects, and Hashes), not
> just Strings.
>
> The Reg RubyForge project: http://rubyforge.org/pro...
>
> The Reg Tarball:
> http://rubyforge.org/frs/download.php/4199/reg-0.4...
>
> Reg is best thought of in analogy to regular expressions; Regexps are
> special data structures for matching Strings; Regs are special data
> structures for matching ANY type of ruby data (Strings included, using
> Regexps).
>
> This table compares syntax of reg and regexp for various constructs.
> Keep
> in mind that all Regs are ordinary ruby expressions. The special syntax
>
> is acheived by overriding ruby operators.
> These abbreviations are used:
> re,re1,re2 represent arbitrary regexp subexpressions,
> r,r1,r2 represent arbitrary reg subexpressions
> s,t represent any single character (perhaps appropriately escaped, if
> the char is magical)
>
>
> reg regexp #description
>
> +[r1,r2,r3] /re1re2re3/ #sequence
> -[r1,r2] (re1re2) #subsequence
> r.lit \re #escaping a magical
> regproc{r} #{re} #dynamic inclusion
> r1|r2 or :OR (re1|re2) or [st] #alternation
> ~r [^s] #negation (for scalar r and s)
> r+0 re* #zero or more matches
> r+1 re+ #one or more matches
> r-1 re? #zero or one matches
> r*n re{n} #exactly n matches
> r*(n..m) re{n,m} #at least n, at most m matches
> r-n re{n,} #at least n matches
> r+m re{,m} #at most m matches
> OB . #a single item
> OBS .* #zero or more items
> BR[1,2] \1,\2 #backreference ***
> r>>x or sub sub,gsub #search and replace ***
>
>
> here are features of reg that don't have an equivalent in regexp
> r.la #lookahead ***
> ~-[] #subsequence negation w/lookahead ***
> & or :AND #all alternatives match
> ^ or :XOR #exactly one of alternatives matches
> +{r1=>r2} #hash matcher
> -{name=>r} #object matcher
> obj.reg #turn any ruby object into a reg that matches if
> obj.=== succeeds
> /re/.sym #a symbol regex
> proceq(klass){rcode} #a proc{} that responds to === by invoking the
> proc's call
> OBS as un-anchor #opposite of ^ and $ when placed at edges of a
> reg array (kinda cheesy)
> name=r #named subexpressions
>
> recursive matches via regvariables&regconstants ***
>
> *** = not implemented yet.
>
>
> Reg is kind of hard to wrap your mind around, so here are some
> examples:
>
> Matches array containing exactly 2 elements; 1st is another array, 2nd
> is integer:
> +[Array,Integer]
>
> Like above, but 1st is array of arrays of symbol
> +[+[+[Symbol.reg+0]+0],Integer]
>
> Matches array of at least 3 consecutive symbols and nothing else:
> +[Symbol.reg+3]
>
> Matches array with at least 3 symbols in it somewhere:
> +[OBS, Symbol.reg+3, OBS]
>
> Matches array of at most 6 strings starting with 'g'
> +[/^g/-6] #no .reg necessary for regexp
>
> Matches array of between 5 and 9 hashes containing a key :k pointing to
> something non-nil:
> +[ +{:k=>~nil.reg}*(5..9) ]
>
> Matches an object with Integer instance variable @k and property (ie
> method) foobar that returns a string with 'baz' somewhere in it:
> -{:@k=>Integer, :foobar=>/baz/}
>
> Matches array of 6 hashes with 6 as a value of at least one key,
> followed by 18 objects with an attribute @s which is a String:
> +[ +{OB=>6}*6, -{:@s=>String}*18 ]
>
>
> Status:
> Some highly nested vector reg constructions still don't work quite
> right. (For examples, search on eat_unworking in regtest.rb.) A number
> of features are unimplemented at this point, most notably
> backreferences and substitutions.
>
>
>

ptkwt

4/24/2005 4:16:00 AM

Wow.

Just curious: what kind needs led you to develop this?

Phil

In article <1114309915.927676.128220@g14g2000cwa.googlegroups.com>,
vikkous <google@inforadical.net> wrote:
>I would like to announce the first version, 0.4.0, of Reg, the Ruby
>Extended Grammar. Reg is a library for pattern matching in ruby data
>structures. Reg provides Regexp-like match and match-and-replace for
>all data structures (particularly Arrays, Objects, and Hashes), not
>just Strings.
>
>The Reg RubyForge project: http://rubyforge.org/pro...
>
>The Reg Tarball:
>http://rubyforge.org/frs/download.php/4199/reg-0.4...
>
>Reg is best thought of in analogy to regular expressions; Regexps are
>special data structures for matching Strings; Regs are special data
>structures for matching ANY type of ruby data (Strings included, using
>Regexps).
>
>This table compares syntax of reg and regexp for various constructs.
>Keep
>in mind that all Regs are ordinary ruby expressions. The special syntax
>
>is acheived by overriding ruby operators.
>These abbreviations are used:
>re,re1,re2 represent arbitrary regexp subexpressions,
>r,r1,r2 represent arbitrary reg subexpressions
>s,t represent any single character (perhaps appropriately escaped, if
>the char is magical)
>
>
>reg regexp #description
>
>+[r1,r2,r3] /re1re2re3/ #sequence
>-[r1,r2] (re1re2) #subsequence
>r.lit \re #escaping a magical
>regproc{r} #{re} #dynamic inclusion
>r1|r2 or :OR (re1|re2) or [st] #alternation
>~r [^s] #negation (for scalar r and s)
>r+0 re* #zero or more matches
>r+1 re+ #one or more matches
>r-1 re? #zero or one matches
>r*n re{n} #exactly n matches
>r*(n..m) re{n,m} #at least n, at most m matches
>r-n re{n,} #at least n matches
>r+m re{,m} #at most m matches
>OB . #a single item
>OBS .* #zero or more items
>BR[1,2] \1,\2 #backreference ***
>r>>x or sub sub,gsub #search and replace ***
>
>
>here are features of reg that don't have an equivalent in regexp
>r.la #lookahead ***
>~-[] #subsequence negation w/lookahead ***
>& or :AND #all alternatives match
>^ or :XOR #exactly one of alternatives matches
>+{r1=>r2} #hash matcher
>-{name=>r} #object matcher
>obj.reg #turn any ruby object into a reg that matches if
>obj.=== succeeds
>/re/.sym #a symbol regex
>proceq(klass){rcode} #a proc{} that responds to === by invoking the
>proc's call
>OBS as un-anchor #opposite of ^ and $ when placed at edges of a
>reg array (kinda cheesy)
>name=r #named subexpressions
>
>recursive matches via regvariables&regconstants ***
>
>*** = not implemented yet.
>
>
>Reg is kind of hard to wrap your mind around, so here are some
>examples:
>
>Matches array containing exactly 2 elements; 1st is another array, 2nd
>is integer:
>+[Array,Integer]
>
>Like above, but 1st is array of arrays of symbol
>+[+[+[Symbol.reg+0]+0],Integer]
>
>Matches array of at least 3 consecutive symbols and nothing else:
>+[Symbol.reg+3]
>
>Matches array with at least 3 symbols in it somewhere:
>+[OBS, Symbol.reg+3, OBS]
>
>Matches array of at most 6 strings starting with 'g'
>+[/^g/-6] #no .reg necessary for regexp
>
>Matches array of between 5 and 9 hashes containing a key :k pointing to
>something non-nil:
>+[ +{:k=>~nil.reg}*(5..9) ]
>
>Matches an object with Integer instance variable @k and property (ie
>method) foobar that returns a string with 'baz' somewhere in it:
>-{:@k=>Integer, :foobar=>/baz/}
>
>Matches array of 6 hashes with 6 as a value of at least one key,
>followed by 18 objects with an attribute @s which is a String:
>+[ +{OB=>6}*6, -{:@s=>String}*18 ]
>
>
>Status:
>Some highly nested vector reg constructions still don't work quite
>right. (For examples, search on eat_unworking in regtest.rb.) A number
>of features are unimplemented at this point, most notably
>backreferences and substitutions.
>

Peter Suk

4/24/2005 4:40:00 AM

On Apr 23, 2005, at 9:34 PM, vikkous wrote:

> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar.

This is like too good/weird to be true.

--Peter

--
There's neither heaven nor hell, save what we grant ourselves.
There's neither fairness nor justice, save what we grant each other.

Mathieu Bouchard

4/24/2005 5:34:00 AM

vikkous

4/24/2005 7:47:00 AM

That's a long story, and well worth telling.

A long time ago, I wanted a better regexp than regexp. My search ended
when I found an extremely obscure language called gema (the
general-purpose matcher). I'm guessing that I'm the only person to ever
take gema seriously. For a time, I became the worlds foremost expert on
gema. Gema is designed around the idea that all computation can be
modeled as pattern and replacement. Everything in gema is pattern and
replacement... essentially everything is done with regexps. I was
fascinated with the idea. This seemed to me to be a much better model
for most programming problems, which typically involve reading input,
tranforming it in some way, and writing it out again. Conventional
languages (starting with fortran, and including ruby) are based around
the idea of a program being a long string of formulas. This is great
for math-heavy stuff, but most programming is really about data
manipulation, not math.

But there was trouble in paradise. Gema was wonderful, but weird. The
syntax was cranky. The author had issued one version long ago then
disappeared. Gema code was hard to read, in part because
everythingwasalljammedtogether.
Ifyouinsertspacestomakeitmorereadable,itchangesthesemanticsofyourprogram.
There were strange problems that I never tracked down or fully
characterized. The only data-type was the string. You had to be an
expert at avoiding the invisible pitfalls of the language to get
anywhere. But I did get surprisingly far. I managed to coax gema into
becoming a true parser, and parsing a toy language.
I wanted to write a compiler in gema. Yes, the whole compiler. And
parsing the toy language was already straining its capabilites. It
wasn't the data model; I actually figured out how to model all other
data types using strings. A match-and-replace language is actually much
better suited to most compiler tasks than an algol-like formula
language.

Eventually, I abandoned gema, determined to recreate it's glory in a
cleaner form. It was at about this time that I discovered ruby. The
successor to gema was ruma, the ruby matcher. Ruma would be basically
just like gema, but without the problems. Whitespace allowed between
tokens. Proper quotation mechanisms, including nested quotes. And the
language used in the actions (replacements) would be full ruby, instead
of gema's inadequate and crude action language.

Ruma got maybe halfway done... quite a ways, really. As part of ruma, I
needed a ruby lexer to make sense of the actions. This turned out to be
quite a lot harder than I had anticipated; I'm still working on that
lexer.

After grinding away at the lexer for a while, dreaming of ruma in the
meantime, I had a brainstorm. Ruma, like gema, was to be a string-based
language. It only operated on strings. In gema, that was just fine
because everything was strings and you just had to live with that. But
ruby has all these other types, a real type system. Wouldn't it be nice
to have those sophisticated search capabilites for other types too?
Well, since I proved to myself that all data types can be converted to
strings, why not convert the ruby data into strings and then match that
in ruma. Of course, it would be so much nicer to just do the matching
on the data in it's original form....

The breakthrough came when I realized how malleable ruby really is. I
had become accustomed to c, which I still love, but in so many ways
it's so much more limited. I didn't really have to write my own parser
and lexer; ruby could do it all for me. I just had to override a bunch
of operators.

After that, it was simple. All I do is override the right operators,
and ruby does the parsing and hands me the match expressions in
already-parsed form. Reg is amazingly small in the end. Most of the
effort and code went into the array matcher, but at least as much
functionality is to be had from the hash and object matchers, which
were trivial.

vikkous

4/24/2005 7:59:00 AM

> Can it also match on IO ? I'm particularly thinking of a stream
> implementation that supports illimited pushback of characters...

I would very much like to do this, but right now, no. I'm not sure
exactly what would be involved in having the array matcher match files
as well; it seems like you might have to rip out the guts of the
backtracking engine to support it... but maybe not. Anyway, stay tuned
for a future release.

Just having the ability to compare regexps directly against files would
be really helpful in the construction of lexers of all sorts. Java has
this; why doesn't ruby?

> Because if it does, then you've got a lexer system that is also good
as
> something else than just a damn lexer.

Lexers, parsers, and pattern matching languages get too short a shrift
in my opinion. There's really a lot more they could be used for, if
only people would see... of course, it doesn't help that almost all
existing tools of this kind are string-oriented, and hard to use for
other data.

> And by making regexps unified with the rest of the language, it
brings
> Ruby closer to the Icon language, isn't it?

I wouldn't know... please let know about regexp integration in icon;
maybe there's some features I can steal.

Denis Mertz

4/24/2005 8:46:00 AM

vikkous wrote:

>
> Lexers, parsers, and pattern matching languages get too short a shrift
> in my opinion. There's really a lot more they could be used for, if
> only people would see... of course, it doesn't help that almost all
> existing tools of this kind are string-oriented, and hard to use for
> other data.
>

A small piece of example code could help to open eyes of people that dont
see what could be done with Reg (like me).

Denis

Christian Neukirchen

4/24/2005 9:18:00 AM

"vikkous" <google@inforadical.net> writes:

> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar. Reg is a library for pattern matching in ruby data
> structures. Reg provides Regexp-like match and match-and-replace for
> all data structures (particularly Arrays, Objects, and Hashes), not
> just Strings.

Nifty, nifty, nifty. I really need to have a look at that.

How does it compare to the ML style of argument matching, btw?

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneuk...

Lyndon Samson

4/24/2005 12:04:00 PM

On 4/24/05, vikkous <google@inforadical.net> wrote:
>

It seems similar in spirit to JXPath for java which lets you use XPath
expressions to access objects, Hashs, Arrays, Maps etc which otherwise
is quite longwinded in java ( no snickering please ).

http://jakarta.apache.org/commo...

--
Into RFID? www.rfidnewsupdate.com Simple, fast, news.

Its Me

4/25/2005 5:04:00 PM

This looks great!

I have not played with it yet, so hope these questions are not off base:

- can I bind variables to (parts of) matches
- have you thought about the connection to duck typing?
- any convenient way to match "all" ... like r*(size..size)

e.g.
http://groups-beta.google.com/group/comp.lang.ruby/browse_frm/thread/f2d02d53531408e/d1f3c7e641a53cdc?q=itsme213+pattern&rnum=1#d1f3c7...

"vikkous" <google@inforadical.net> wrote in message
news:1114309915.927676.128220@g14g2000cwa.googlegroups.com...
> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar. Reg is a library for pattern matching in ruby data
> structures. Reg provides Regexp-like match and match-and-replace for
> all data structures (particularly Arrays, Objects, and Hashes), not
> just Strings.
>
> The Reg RubyForge project: http://rubyforge.org/pro...
>
> The Reg Tarball:
> http://rubyforge.org/frs/download.php/4199/reg-0.4...
>
> Reg is best thought of in analogy to regular expressions; Regexps are
> special data structures for matching Strings; Regs are special data
> structures for matching ANY type of ruby data (Strings included, using
> Regexps).
>
> This table compares syntax of reg and regexp for various constructs.
> Keep
> in mind that all Regs are ordinary ruby expressions. The special syntax
>
> is acheived by overriding ruby operators.
> These abbreviations are used:
> re,re1,re2 represent arbitrary regexp subexpressions,
> r,r1,r2 represent arbitrary reg subexpressions
> s,t represent any single character (perhaps appropriately escaped, if
> the char is magical)
>
>
> reg regexp #description
>
> +[r1,r2,r3] /re1re2re3/ #sequence
> -[r1,r2] (re1re2) #subsequence
> r.lit \re #escaping a magical
> regproc{r} #{re} #dynamic inclusion
> r1|r2 or :OR (re1|re2) or [st] #alternation
> ~r [^s] #negation (for scalar r and s)
> r+0 re* #zero or more matches
> r+1 re+ #one or more matches
> r-1 re? #zero or one matches
> r*n re{n} #exactly n matches
> r*(n..m) re{n,m} #at least n, at most m matches
> r-n re{n,} #at least n matches
> r+m re{,m} #at most m matches
> OB . #a single item
> OBS .* #zero or more items
> BR[1,2] \1,\2 #backreference ***
> r>>x or sub sub,gsub #search and replace ***
>
>
> here are features of reg that don't have an equivalent in regexp
> r.la #lookahead ***
> ~-[] #subsequence negation w/lookahead ***
> & or :AND #all alternatives match
> ^ or :XOR #exactly one of alternatives matches
> +{r1=>r2} #hash matcher
> -{name=>r} #object matcher
> obj.reg #turn any ruby object into a reg that matches if
> obj.=== succeeds
> /re/.sym #a symbol regex
> proceq(klass){rcode} #a proc{} that responds to === by invoking the
> proc's call
> OBS as un-anchor #opposite of ^ and $ when placed at edges of a
> reg array (kinda cheesy)
> name=r #named subexpressions
>
> recursive matches via regvariables&regconstants ***
>
> *** = not implemented yet.
>
>
> Reg is kind of hard to wrap your mind around, so here are some
> examples:
>
> Matches array containing exactly 2 elements; 1st is another array, 2nd
> is integer:
> +[Array,Integer]
>
> Like above, but 1st is array of arrays of symbol
> +[+[+[Symbol.reg+0]+0],Integer]
>
> Matches array of at least 3 consecutive symbols and nothing else:
> +[Symbol.reg+3]
>
> Matches array with at least 3 symbols in it somewhere:
> +[OBS, Symbol.reg+3, OBS]
>
> Matches array of at most 6 strings starting with 'g'
> +[/^g/-6] #no .reg necessary for regexp
>
> Matches array of between 5 and 9 hashes containing a key :k pointing to
> something non-nil:
> +[ +{:k=>~nil.reg}*(5..9) ]
>
> Matches an object with Integer instance variable @k and property (ie
> method) foobar that returns a string with 'baz' somewhere in it:
> -{:@k=>Integer, :foobar=>/baz/}
>
> Matches array of 6 hashes with 6 as a value of at least one key,
> followed by 18 objects with an attribute @s which is a String:
> +[ +{OB=>6}*6, -{:@s=>String}*18 ]
>
>
> Status:
> Some highly nested vector reg constructions still don't work quite
> right. (For examples, search on eat_unworking in regtest.rb.) A number
> of features are unimplemented at this point, most notably
> backreferences and substitutions.
>

comp.lang.ruby

Announcing Reg 0.4.0

vikkous

Jon Raphaelson

ptkwt

Peter Suk

Mathieu Bouchard

vikkous

vikkous

Denis Mertz

Christian Neukirchen

Lyndon Samson

Its Me

x Login to ForumsZone