Asp Forum - XML parser - comp.lang.ruby

Cédric H.

7/19/2008 8:42:00 PM

Hi guys,

I'm looking for some information about the xml libraries available in
Ruby.

I've read a few blog post about the pro's and con's of REXML and
Libxml but I still have some questions :

- as I understand it REXML is part of ruby standard library and so is
included in ruby distribution ?

- libxml is a wrapper for gnome libxml and must be installed and
compiled with gem ?

- is libxml really a full validating and compliant parser ?

- how do you use xslt in Ruby ? do you use http://raa.ruby-lang.org/project/...
or http://rubyforge.org/projec... (if I'm right the second one
is part of libxml ? )

As you see I'm lost and I would really appreciate your help or some
comprehensive post about xml processing in ruby .

Thanks !

Cedric

7 Answers

Dejan Dimic

7/19/2008 10:44:00 PM

On Jul 19, 10:42 pm, Cédric H. <cedric.hernalste...@gmail.com> wrote:
> Hi guys,
>
> I'm looking for some information about the xml libraries available in
> Ruby.
>
> I've read a few blog post about the pro's and con's of REXML and
> Libxml but I still have some questions :
>
> - as I understand it REXML is part of ruby standard library and so is
> included in ruby distribution ?
>
> - libxml is a wrapper for gnome libxml and must be installed and
> compiled with gem ?
>
> - is libxml really a full validating and compliant parser ?
>
> - how do you use xslt in Ruby ? do you usehttp://raa.ruby-lang.org/project/...
> orhttp://rubyforge.org/projec...(if I'm right the second one
> is part of libxml ? )
>
> As you see I'm lost and I would really appreciate your help or some
> comprehensive post about xml processing in ruby .
>
> Thanks !
>
> Cedric

Parsing, manipulating XML is such wide subject. There is a more then
one bookshelf full with books about it. Doing it with Ruby is not an
exception.

Beside these two libraries mentioned there is also an Hpricot (http://
code.whytheluckystiff.net/hpricot/) and you should try it to.

When dealing with XML you should consider the following questions:
Who and on what OS the code will be running?
How big the XML document is?
Is the speed a decisive parameter?
What’s the magnitude of manipulation required?

Answers to these questions could help you pick the optimum library but
you should be familiar with all of them.

Do a research, play a little and pick the more appealing to you.

Phlip

7/20/2008 12:39:00 AM

Cédric H. wrote:

> I'm looking for some information about the xml libraries available in
> Ruby.
>
> I've read a few blog post about the pro's and con's of REXML and
> Libxml but I still have some questions :
>
> - as I understand it REXML is part of ruby standard library and so is
> included in ruby distribution ?

Yes. It's also widely acknowledged as very slow. The RE stands for Regular
Expressions, which are only fast when used carefully. Basing an entire parser on
them tends to abuse them.

This blog show how to spot-check compliance issues in the three leading Ruby XML
parsers:

http://www.oreillynet.com/onlamp/blog/2007/08/assert_hpri...

> - libxml is a wrapper for gnome libxml and must be installed and
> compiled with gem ?

Ordinarily, that process would be mostly harmless. You may already have
libxml2-dev, if you have a GNU platform such as Ubuntu or CygWin.

However, the current libxml-ruby has a nasty bug. First, it sprays lots of

No definition for ruby_xml_parser_context_options_get

into the console. Then it refuses to install the libxml_so.so file that it just
created. I don't know this bug's status, but because my assert_xpath works best
with libxml, I must overcome it whenever we build a new workstation at work!
Sometimes I must manually copy its executables into Ruby's paths...

(Our production code does not use libxml - only the test code.)

I just tried to install while writing this post, and 0.8.1 might have worked on
Ubuntu.

> - is libxml really a full validating and compliant parser ?

I suspect it's the reference implementation for XML. It certainly takes every
DOCTYPE and schema very seriously!

Better, it actually forgives some errors and keeps working, unlike REXML

> - how do you use xslt in Ruby ? do you use http://raa.ruby-lang.org/project/...
> or http://rubyforge.org/projec... (if I'm right the second one
> is part of libxml ? )
>
> As you see I'm lost and I would really appreciate your help or some
> comprehensive post about xml processing in ruby .

Sorry! I was knocking 'em down, and you lost me at XSLT.

In a pinch, I would pipe text thru xsltproc, and not worry about deep language
integration. XSLT is nothing but a big filter, so I thought you could use it
without making an object out of it.

--
Phlip

Phlip

7/20/2008 1:00:00 AM

> Beside these two libraries mentioned there is also an Hpricot (http://
> code.whytheluckystiff.net/hpricot/) and you should try it to.

Hpricot is a jack-of-all-trades-master-of-some-of-them. Don't look to it for
schema validation, XSLT, or true XPath.

> When dealing with XML you should consider the following questions:
> Who and on what OS the code will be running?
> How big the XML document is?
> Is the speed a decisive parameter?
> What?s the magnitude of manipulation required?

The two XML parser models are DOM and SAX.

DOM converts every tag into an Object (hence Document Object Model), and lets
you traverse the objects. The conversion is slow, and puts the entire document
into memory, simultaneously.

SAX lets you register callbacks to call when an XML reader encounters certain
tags. It treats the input XML as a stream, hence zipping past nodes you don't
need is very fast.

But I don't know the Ruby SAX solution!

--
Phlip

Phillip Oertel

7/20/2008 1:04:00 AM

hi,

you may enjoy reading this!
=
http://www.rubyinside.com/ruby-xml-crisis-over-libxml-0-8-0-relea...
tml
(posted two days ago)

kind regards,
phillip

---

Am 20.07.2008 um 02:34 schrieb Phlip:

> C=E9dric H. wrote:
>
>> I'm looking for some information about the xml libraries available in
>> Ruby.
>> I've read a few blog post about the pro's and con's of REXML and
>> Libxml but I still have some questions :
>> - as I understand it REXML is part of ruby standard library and so is
>> included in ruby distribution ?
>
> Yes. It's also widely acknowledged as very slow. The RE stands for =20
> Regular Expressions, which are only fast when used carefully. Basing =20=

> an entire parser on them tends to abuse them.
>
> This blog show how to spot-check compliance issues in the three =20
> leading Ruby XML parsers:
>
> http://www.oreillynet.com/onlamp/blog/2007/08/assert_hpri...
>
>> - libxml is a wrapper for gnome libxml and must be installed and
>> compiled with gem ?
>
> Ordinarily, that process would be mostly harmless. You may already =20
> have libxml2-dev, if you have a GNU platform such as Ubuntu or CygWin.
>
> However, the current libxml-ruby has a nasty bug. First, it sprays =20
> lots of
>
> No definition for ruby_xml_parser_context_options_get
>
> into the console. Then it refuses to install the libxml_so.so file =20
> that it just created. I don't know this bug's status, but because my =20=

> assert_xpath works best with libxml, I must overcome it whenever we =20=

> build a new workstation at work! Sometimes I must manually copy its =20=

> executables into Ruby's paths...
>
> (Our production code does not use libxml - only the test code.)
>
> I just tried to install while writing this post, and 0.8.1 might =20
> have worked on Ubuntu.
>
>> - is libxml really a full validating and compliant parser ?
>
> I suspect it's the reference implementation for XML. It certainly =20
> takes every DOCTYPE and schema very seriously!
>
> Better, it actually forgives some errors and keeps working, unlike =20
> REXML
>
>> - how do you use xslt in Ruby ? do you use =
http://raa.ruby-lang.org/project/...
>> or http://rubyforge.org/projec... (if I'm right the second one
>> is part of libxml ? )
>> As you see I'm lost and I would really appreciate your help or some
>> comprehensive post about xml processing in ruby .
>
> Sorry! I was knocking 'em down, and you lost me at XSLT.
>
> In a pinch, I would pipe text thru xsltproc, and not worry about =20
> deep language integration. XSLT is nothing but a big filter, so I =20
> thought you could use it without making an object out of it.
>
> --=20
> Phlip
>

Trans

7/20/2008 3:17:00 AM

On Jul 19, 9:03=A0pm, Phillip Oertel <m...@phillipoertel.com> wrote:
> hi,
>
> you may enjoy reading this!http://www.rubyinside.com/ruby-xml-cr...
-libxml-0-8-0-released-...
> (posted two days ago)

FYI, Still some final fine-tuning going on, so don't expect everything
to be all roses just quite yet. But we are close, and might actually
get to to a 1.0.0 release soon.

T.

Phlip

7/20/2008 3:28:00 AM

Trans wrote:

>> you may enjoy reading this!http://www.rubyinside.com/ruby-xml-crisis-over-libxml-0-8-0......
>> (posted two days ago)

Tx - that's why my install today worked, right?

> FYI, Still some final fine-tuning going on, so don't expect everything
> to be all roses just quite yet. But we are close, and might actually
> get to to a 1.0.0 release soon.

And to use it with assert_xpath you just gotta put invoke_libxml in your setup...

Douglas A. Seifert

7/20/2008 10:05:00 PM

Phlip wrote:
>
> But I don't know the Ruby SAX solution!
>
REXML supports a "SAX Like" stream listening interface as well as DOM.
See the REXML tutorial at
http://www.germane-software.com/software/rexml/docs/tut...,
scroll down until you see the section headed with "Stream Parsing". The
upshot is you write a class that has callback methods (see
http://www.germane-software.com/software/rexml/doc/classes/REXML/StreamLis...
for a complete list of callbacks) and pass an instance of the class to
REXML's parse_stream method. REXML also supports a SAX2 API, but I have
never used it. Look for the heading "SAX2 Stream Parsing" in the
tutorial link above.

Recently converted a poor DOM based parsing solution to a stream
listener based solution (not SAX2) and realized an order of magnitude
improvement in performance.

Saludos,

-Doug

comp.lang.ruby

XML parser

Cédric H.

Dejan Dimic

Phlip

Phlip

Phillip Oertel

Trans

Phlip

Douglas A. Seifert

x Login to ForumsZone