[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[ann] regexp-engine 0.4

Simon Strandgaard

11/25/2003 4:46:00 PM

download:
http://rubyforge.org/download.php/219/regexp-engine-...

homepage:
http://raa.ruby-lang.org/list.rhtml?n...


Try it out; tell me your opinion.

--
Simon Strandgaard



Changes
=======

non-greedy matching has been implemented. You can now do
/a(.*?)a/.match("0a1a2a3").to_a #=> ["a1a", "1"]

Now using iterators internally; the way has been paved
for i18n, so that the engine operate on unicode, jis..etc.


Status
======

Data structure has stabilized and the fundemental operations
are working quite good (was difficult to implement).
Uses iterators, this should make it easy to operate on many
different kinds of input-streams (unicode, UTF-8), but right
now the iterator only works on ASCII.
Performance is not impressive.
Left is all the easy stuff (character-classes, unicode, optimizaition).

* features of the scanner so far:
a|b|c alternation
* + ? {n,m} repeat(min..max) greedy/lazy
( ... ) grouping -> register.. nested repeat also works
. match anything except newline
\1 .. \9 backreferences

* features of the parser so far:
a|b|c alternation
* *? repeat(0..infinity) greedy/lazy
+ +? repeat(1..infinity) greedy/lazy
{n,} {n,}? repeat(n..infinity) greedy/lazy
? ?? repeat(0..1) greedy/lazy
{n,m} {n,m}? repeat(n..m) greedy/lazy
{n} {n}? repeat(n..n) greedy/lazy (does lazy make sense here?)
( ... ) group -> register
. match anything except newline
\1 .. \9 backreferences
\ escape
specialcase: illegal ranges is treated as they are just
ordinary literals.


License
=======

Ruby's license.


About
=====

AEditor needs a regexp engine. You probably think, why not
rely on an existing engine (for instance Ruby's regexp engine) ?
Existing engines are not flexible enough. The iterator pattern
provides that needed flexibility. Thus it should not matter
wheter the engine operate on: UCS-4 or UTF-8 or ASCII.

Goal is to build an engine which is fully compatible with Ruby's
regexp syntax, which can work with iterators.

Eventualy extend the regexp syntax, with some editor-stuff.
For instance: point where cursor should be placed,
match text which is legal ruby code, execute regexp within
retangular selection... etc. I am open to other suggestions.

Eventualy re-implement in C++ to gain performance.