Dennis Ranke
4/4/2007 2:01:00 PM
James Edward Gray II wrote:
> On Apr 4, 2007, at 6:15 AM, Robert Klemme wrote:
>
>> On 04.04.2007 12:00, Peter Szinek wrote:
>>> Robert Klemme wrote:
>>>> On 04.04.2007 10:53, Peter Szinek wrote:
>>>>> I really just need a fast XML parser which is easy to install,
>>>>> that's all. scRUBYt! is a high-level framework, aimed also at
>>>>> non-programmers, so I can not expect that all my potential users
>>>>> are handy with debian's package policy and the joys of libxml
>>>>> installing on win32 :)
>>>>
>>>> Maybe then you'll simply have to decide whether ease of use or
>>>> performance is more important to you.
>>> Should I interpret this as 'decide between REXML and libxml'?
>>> There are really no other alternatives?
>>
>> AFAIK REXML is the only pure Ruby XML parser - and it comes with the
>> standard distribution.
>
> Sounds like it is time for FasterXML. :)
One pointer: REXML comes with quite a fast pullparser, and it should be
possible to base some lightweight xml document lib on that. (The
documentation says that the API should not be considered stable, but I'm
sure that could be resolved with the REXML author.)
As a proof of concept, see the attached code. We use it in our company
to load and process XML files generated by our tools and OpenOffice Calc.
I just tested it on a 1MB XML from an .ods file, which it loaded
successfully in < 2 seconds.
Writing a fast XPath implementation to match this might be quite a
challenge, though. ;)
Dennis
require 'rexml/parsers/pullparser'
module XmlSimple
def self.load(filename)
parse(File.read(filename))
end
def self.parse(string)
parser = REXML::Parsers::PullParser.new(string)
return Node.new(['root', {}], parser)
end
class Node
include Enumerable
instance_methods(true).each {|m| undef_method(m) unless m =~ /__.*__/}
attr_reader :name, :attr, :text, :children
def initialize(token, parser)
@name = token[0]
@text = ''
@siblings = [self]
@attr = token[1]
@nodes = {}
@children = []
loop do
if parser.has_next?
tok = parser.pull
else
tok = REXML::Parsers::PullEvent.new([:end_element, 'root'])
end
case tok.event_type
when :start_element
node = Node.new(tok, parser)
@children << node
if @nodes[tok[0]]
@nodes[tok[0]].push_sibling(node)
else
@nodes[tok[0]] = node
end
when :end_element
raise unless tok[0] == @name
return
when :text
@text << tok[0]
@children << tok[0]
end
end
end
def push_sibling(node)
@siblings << node
end
def to_a
@siblings
end
def each(&block)
@siblings.each(&block)
end
def method_missing(m)
return @nodes[m.to_s]
end
def [](m)
return @nodes[m]
end
def inspect(indent = '')
r = indent + @name + ":\n"
indent += ' '
r << indent + 'attr: ' + attr.inspect + "\n" unless attr.empty?
r << indent + 'text: ' + text.inspect + "\n" unless text.empty?
@nodes.each do |k, v|
v.each {|n| r << n.inspect(indent)}
end
return r
end
end
end