Dejan Dimic
9/27/2008 8:39:00 PM
On Sep 27, 9:27 pm, Eric Will <rak...@malkier.net> wrote:
> Hello World,
>
> I am writing an XMPP (Jabber) server in Ruby. XMPP uses XML for its
> protocol. This means I have to do a good deal of XML parsing, in Ruby.
>
> Right now I am using REXML to parse the individual stanzas as they
> come in. However, in order to do this without REXML complaining of
> "multiple root elements" (that is, XMPP is streaming XML over a TCP
> socket, so I only get the root element once) I have to wrap every
> incoming chunk of XMPP with my own <root/> tag, and then ignore that
> after REXML parses it. I am currently unhappy with this approach.
>
> Another option is to use REXML's stream parsing. I don't really like
> this idea. It seems the only benefit of using SAX(ish) parsing is when
> you're dealing with huge documents that you don't want to load into
> memory. This isn't the case. I get maybe 5-10 objects per parse. Most
> of the people I've talked to in XMPP insist on using SAX (or something
> like it, such as REXML's stream parsing). The other reason I don't
> like REXML's stream parsing (or libxml's SAX) is because I have to
> provide a class instance for it to use for the event-parsing, and this
> class has to be a giant state machine, which seems wrong to me. I
> don't want to have to write a complicated class to, in effect, parse
> the XML myself when the XML parser should be doing this for me.
>
> The other options include using hpricot to do the incoming parsing
> (since it's C, and way faster than REXML) and continue to use REXML
> for generating the outgoing XML (I can't seem to figure out how to do
> this in hpricot, if it's even possible). Although, XMPP requires XML
> well-formedness, and hpricot does not do validation (to the best of my
> knowledge). I also like xml-simple, but it uses REXML underneath it
> all, so I'm left with the same issues.
>
> My real question is, is there a GOOD REASON to switch for the scheme I
> currently use? A number of people seem to think it's the "Wrong Thing"
> to do, but I'm not quite sure what the "Right Thing" is. I don't think
> it's SAX.
>
> Thanks for any feedback.
>
> -- rakaur
Every problem can have multiple solutions.
Personally I will go for the SAX XML processing of the incoming XML
stream.
It can not be so hard to build the event driven solution and the state
machine should not be more complicated then the DOM node processing.
The benefit you can get is to start building the response while you
processing the XML input.
You can't get much faster then that.
If you think it's not your cup of tee thats totally OK.
If you have to parse chinks of XML data then hpricot is my favorite
choice.
While analyzing the DOM for nods of interest, preferably with XPath
you should build the response.
You can do it with hpricot to.
In a word, do it as you see fit, and then try to make it better. :-)