[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Re: Unicode/UTF-8 confusion

Matt Nordhoff

3/15/2008 7:00:00 PM

Tom Stambaugh wrote:
> I'm still confused about this, even after days of hacking at it. It's
> time I asked for help. I understand that each of you knows more about
> Python, Javascript, unicode, and programming than me, and I understand
> that each of you has a higher SAT score than me. So please try and be
> gentle with your responses.
>
> I use simplejson to serialize html strings that the server is delivering
> to a browser. Since the apostrophe is a string terminator in javascript,
> I need to escape any apostrophe embedded in the html.
>
> Just to be clear, the specific unicode character I'm struggling with is
> described in Python as:
> u'\N{APOSTROPHE}'}. It has a standardized utf-8 value (according to, for
> example, http://www.fileformat.info/info/unicode/char/0027...)
> of 0x27.
>
> This can be expressed in several common ways:
> hex: 0x27
> Python literal: u"\u0027"
>
> Suppose I start with some test string that contains an embedded
> apostrophe -- for example: u" ' ". I believe that the appropriate
> json serialization of this is (presented as a list to eliminate notation
> ambiguities):
>
> ['"', ' ', ' ', ' ', '\\', '\\', '0', '0', '2', '7', ' ', ' ', ' ', '"']
>
> This is a 14-character utf-8 serialization of the above test string.
>
> I know I can brute-force this, using something like the following:
> def encode(aRawString):
> aReplacement = ''.join(['\\', '0', '0', '2', '7'])
> aCookedString = aRawString.replace("'", aReplacement)
> answer = simplejson.dumps(aCookedString)
> return answer
>
> I can't even make mailers let me *TYPE* a string literal for the
> replacement string without trying to turn it into an HTML link!
>
> Anyway, I know that my "encode" function works, but it pains me to add
> that "replace" call before *EVERY* invocation of the simplejson.dumps()
> method. The reason I upgraded to 1.7.4 was to get the c-level speedup
> routine now offered by simplejson -- yet the need to do this apostrophe
> escaping seems to negate this advantage! Is there perhaps some
> combination of dumps keyword arguments, python encode()/str() magic, or
> something similar that accomplishes this same result?
>
> What is the highest-performance way to get simplejson to emit the
> desired serialization of the given test string?

simplejson handles all necessary escaping of stuff like quotes...
--