Thomas Counsell
2/27/2005 4:18:00 PM
Instiki uses RedCloth which converts a language called Textile. The
other ones markup languages I've seen around are BlueCloth, which
converts a markup language called markdown, and the built in RDoc.
AFAIK none are very close to TWiki markup.
Austin is right that RedCloth uses a bunch of Regexps rather than a
parser, and that the problems with this is clashing between Regexps
rather than speed.
The techniques I've seen in the RedCloth code to reduce clashing are:
1) Complicated regexps that try to be very, very careful about what
they match (but consequently can be vulnerable to stack overflows)
2) Being very careful about the order of the RE replacements
3) Doing the replacement in two stages, first a RE matches a symbol
(e.g. a url), and replaces it in the text with a code (e.g. $1), then
once all the REs have matched, it goes back through the code replacing
the codes (e.g. $1) with the desired text (<a href=...).
I had a go at re-implementing Redcloth in a more parser-like fashion
(although not using any external library) and found it much easier to
code and understand conceptually, but it worked out much much slower.
With hindsight, I think that parsers will tend always to be slower
because in wiki markup (and unlike in programming languages), most of
the text is not symbolically important therefore it is quicker to look
for the symbols using a bunch of RE replacements than consider each bit
of string and ask whether it is a symbol.
Having said that, if you choose a parser that spits out C code then it
may be quicker than RE subs in pure ruby. There do appear to be some
libraries for this, but I'm afraid I haven't any experience to pass on.
Hope that helps
Tom
On 27 Feb 2005, at 14:54, Randy Kramer wrote:
> I have the need to translate several megabytes of TWiki marked up text
> to
> HTML. In fact, it may not even be a one time thing--I'm planning to
> build a
> wiki like thing, and will probably keep TWiki markup as the native
> language
> for storage. Over time, I may extend the TWiki markup syntax.
>
> (Aside: I'm aware that Ruby has two other markup languages sometimes
> used for
> wikis (Red Cloth (?) and Text <something>?)--I may someday support
> those as
> well, but I'm not immediately interested in converting the megabytes
> of data
> I have and learning a different markup language.)
>
> My impression is that some or many wikis (this is my impression of
> TWiki)
> don't use a "real parser" (like YACC or whatever), but instead simply
> search
> and replace in the text using (many) Regular Expressions.
> Conceptually, that
> seems an easy approach (less learning on my part, but probably tedious
> creation of many REs (or borrowing from TWiki's Perl).
>
> I've never used a parser, and am not really that familiar with them,
> but I'm
> wondering what the tradeoffs might be. I've heard that a parser may be
> easier to modify to extend the syntax. (But, I have a feeling that
> the TWiki
> markup language might not be "regular" enough to be parsed by
> something like
> YACC.):
>
> * If I did create the proper grammar rules, would parsing using
> something
> like YACC be faster than a bunch of RE replacements?
>
> * Any recommendations for a parser in Ruby? I think there are a
> couple,
> I've been doing some Googling / reading and have come across
> references to
> parse.rb and (iirc) something called Coco (??).
>
> * Anybody know the approach followed by existing Ruby wikis like
> Instiki,
> Ruwiki, etc.?
>
> Other comments, hints, suggestions?
>
> Randy Kramer
>
>