[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

RE: [XML-SIG] SAX characters() output on multiple lines for non-ascii

Brian Smith

2/2/2008 11:39:00 PM

> def characters(self, chars):
>
> newchars=[]
> newchars.append(chars.encode('ISO-8859-1'))

The SAX parser calls characters() multiple times for the same text block. For example, in the input <foo>123</foo>, characters() could be called once:
handler.characters("123")
or twice:
handler.characters("12")
handler.characters("3")
or:
handler.characters("1")
handler.cahraceters("23")
or three times:
handler.characters("1")
handler.characters("2")
handler.characters("3")

If you want the whole text block, then you need to do something like this:

in __init__:
self.newchars = []

in startElement:
self.newchars = []

in characters:
self.newchars.append(chars)

in endElement:
if len(self.newchars) > 0:
combined = "".join(self.newchars).encode('ISO-8859-1')
print "Strean read is '%s'" % combined

I recommend using ElementTree instead.

- Brian