Brian Smith
2/2/2008 11:39:00 PM
> def characters(self, chars):
>
> newchars=[]
> newchars.append(chars.encode('ISO-8859-1'))
The SAX parser calls characters() multiple times for the same text block. For example, in the input <foo>123</foo>, characters() could be called once:
handler.characters("123")
or twice:
handler.characters("12")
handler.characters("3")
or:
handler.characters("1")
handler.cahraceters("23")
or three times:
handler.characters("1")
handler.characters("2")
handler.characters("3")
If you want the whole text block, then you need to do something like this:
in __init__:
self.newchars = []
in startElement:
self.newchars = []
in characters:
self.newchars.append(chars)
in endElement:
if len(self.newchars) > 0:
combined = "".join(self.newchars).encode('ISO-8859-1')
print "Strean read is '%s'" % combined
I recommend using ElementTree instead.
- Brian