[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: [SOLUTION] GEDCOM Parser (#6

Jamis Buck

11/8/2004 5:35:00 AM

Florian Gross wrote:
> Here's my solution. It builds a tree of the Gedcom nodes.

And here's mine. It attempts to minimize memory usage by writing the XML
as soon as possible for each node. (This was necessary because on one of
my tests--a 7 meg GEDCOM file--it rapidly exhausted almost all of my 1G
of RAM when I used REXML.) Other than that, it is nothing special. I was
swayed by Hans' argument that attributes are most appropriate for
metadata, so I only use them for id and ref.

- Jamis

--
Jamis Buck
jgb3@email.byu.edu
http://www.jamisbuck...
#!/usr/bin/env ruby

class GED2XML

IS_ID = /^@.*@$/

class Node < Struct.new( :level, :tag, :data, :refid )
def initialize( line=nil )
level, tag, data = line.chomp.split( /\s+/, 3 )
level = level.to_i
tag, refid, data = data, tag, nil if tag =~ IS_ID
super level, tag.downcase, data, refid
end
end

def indent( level )
print " " * ( level + 1 )
end

def safe( text )
text.
gsub( /&/, "&amp;" ).
gsub( /</, "&lt;" ).
gsub( />/, "&gt;" ).
gsub( /"/, "&quot;" )
end

def process( io )
node_stack = []

puts "<gedcom>"
wrote_newline = true

io.each_line do |line|
next if line =~ /^\s*$/o
node = Node.new( line )

while !node_stack.empty? && node_stack.last.level >= node.level
prev = node_stack.pop
indent prev.level if wrote_newline
print "</#{prev.tag}>\n"
wrote_newline = true
end

indent node.level if wrote_newline
print "<#{node.tag}"
print " id=\"#{node.refid}\"" if node.refid

if node.data
if node.data =~ IS_ID
print " ref=\"#{node.data}\">"
else
print ">#{safe(node.data)}"
end
wrote_newline = false
else
puts ">"
wrote_newline = true
end

node_stack << node
end

until node_stack.empty?
prev = node_stack.pop
indent prev.level if wrote_newline
print "</#{prev.tag}>\n"
wrote_newline = true
end

puts "</gedcom>"
end

end

GED2XML.new.process ARGF