Robert Klemme
1/13/2008 6:02:00 PM
On 13.01.2008 13:37, Marc Hoeppner wrote:
> Thanks for the great responses!
>
> Just for clarification though:
>
> tag_start identifies an element like "gene" ala
>
> def tag_start(name, attrib)
> if name=="gene"
> do something here
> end
> end
>
> So tag_end is then used if I want to puts everything that was done with
> the element and its children like storing some values in an array or
> something?
Yeah, you could do that. Basically it's completely up to you. The
parser just hands off events for parsed XML items (like starting tags,
closing tags, text data etc.) in the order found in the document.
I have attached another example of an idiom that I use frequently. This
may not be needed in your case but who knows? Basic idea is that the
event listener creates nested listeners (one per element) and hands
processing off to them while maintaining a stack of listeners. That way
you can do different processing based on element name, attributes or
whatever. Might be overkill for your simple example but OTOH if you do
need to do complex processing steps based on elements this might be
exactly what you need. For example, you can store any information you
need in a nested listener and do all the processing in end tag.
Kind regards
robert
#!/bin/env ruby
# Robert Klemme 2007
require 'rexml/document'
require 'rexml/streamlistener'
require 'delegate'
class StreamListener < Delegator
def initialize
@current = NestedStreamListener.new
end
def tag_start(name, attrs)
sl = listener(name, attrs).new
sl.parent = @current
@current = sl
sl.tag_start(name, attrs)
end
def tag_end(name)
@current.tag_end(name)
@current = @current.parent
end
def __getobj__
@current or raise "Cannot handle beyond root"
end
private
def listener(name, attrs)
# more complex code here for specific
# nested listeners
NestedStreamListener
end
end
class BaseNestedStreamListener
include REXML::StreamListener
# structure
attr_accessor :parent
protected
# parents from the root
def parents
res = []
sl = self
while sl
res.unshift sl
sl = sl.parent
end
res
end
end
class NestedStreamListener < BaseNestedStreamListener
attr_accessor :tag_name
def tag_start(name, attrs)
# demo only
self.tag_name = name
parents.each do |par|
print par.tag_name, " "
end
puts
@text = ""
end
def tag_end(name)
print " ", @text.inspect, "\n"
end
def text(s)
(@text ||= "") << s
end
alias cdata text
end
REXML::Document.parse_stream( DATA, StreamListener.new )
__END__
<root>
<Gene-ref>
<name>foo</name>
<start>17</start>
<end>42</end>
</Gene-ref>
<Gene-ref>
<name>bar</name>
<start>43</start>
<end>50</end>
</Gene-ref>
</root>