Matt Nordhoff
1/16/2008 3:47:00 AM
BerlinBrown wrote:
> With this code, ignore/replace still generate an error
>
> # Encode to simple ascii format.
> field.full_content = field.full_content.encode('ascii', 'replace')
>
> Error:
>
> [0/1] 'ascii' codec can't decode byte 0xe2 in position 14317: ordinal
> not in ran
> ge(128)
>
> The document in question; is a wikipedia document. I believe they use
> latin-1 unicode or something similar. I thought replace and ignore
> were supposed to replace and ignore?
Is field.full_content a str or a unicode? You probably haven't decoded
it from a byte string yet.
>>> field.full_content = field.full_content.decode('utf8', 'replace')
>>> field.full_content = field.full_content.encode('ascii', 'replace')
Why do you want to use ASCII? UTF-8 is great. :-)
--