Hey.
Thought I'd drop in on this discussion. There are several threads in
the newsgroup on the topic of the least intrusive API for XML in Ruby.
What I'm understanding is that there are people who want to hide the
XML details of XML whilst in Ruby. This sounds a lot to me like
serialization, and that's a layer above what XML packages provide. A
serialization package, with minimal intrusion, could provide some
support for namespaces and attributes, and would look a lot like what
the minimalists (as in minimally intrusive) are asking for.
Users of XML can generally be divided into two broad camps. There are
those who have some data, and they more or less want or need it to be
XML at some point. On the other side are people who are dealing with
the XML without being too concerned with the content. For those in
the first camp, serialization is a great solution. Those in the
second camp need more control over the data, and a specialized API is
more appropriate. If you've contemplated using YAML instead of XML,
your probably in the first camp. A common reason for being in the
second camp is that you're getting your data from somewhere else.
In my experience, an XML API can be abstracted only so much before you
begin to loose control over the finer details. High level APIs are
great for simple documents, but begins to break down when one
introduces comments, processing instructions, entities, and mixed
content. I'd go a step further and suggest that any sufficiently
abstracted API that entirely hides the XML details of an XML document
will be insufficient to handle all possible legal XML documents.
All that means, though, is that an API that high-level is insufficient
as the only API available for dealing with XML. What that means to me
is that the high-level API should sit on top of another API that
provides finer control. It doesn't mean that the high level API isn't
useful or shouldn't be written.
By the way, I did try to write a transparent API for REXML a couple of
months ago; it looked something like this:
a = Node.new
a << "B" # => <a>B</a>
a.b # => <a>B<b/></a>
a.b[1] # => <a>B<b/><b/><a>
a.b[1]["x"] = "y" # => <a>B<b/><b x="y"/></a>
a.b[0].c # => <a>B<b><c/></b><b x="y"/></a>
a.b.c << "D" # => <a>B<b><c>D</c></b><b x="y"/></a>
I didn't get very far with it; it seemed like terrible hacks were
needed to implement it, and I'm not sure I want to maintain that code,
but if there's enough interest, I might revive it.
In the opposite end of the spectrum is an API that is heavily tied to
XML technology, like XPath:
a = Tree.new( "/a/b[2][ @x = 'y' ]" ) # <a><b/><b x="y"/></a>
a[ "/a/text()" ] = "B" # <a>B<b/><b x="y"/></a>
c = a[ "/a/b/c" ] # <a>B<b><c/></b><b x="y"/></a>
c[ "text()" ] = "D" # <a>B<b><c>D</c></b><b x="y"/></a>
Not so nice for constructing documents, but almost peerless for
accessing nodes.
Of course, there are other, more pressing, issues that don't let me
play too much with this stuff; things like validation, bug fixes,
optimizations, XPath support in the streaming APIs, a good lightweight
API... any number of things.
Anyway, diversity is good.