[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Unicode blues in Python3

nn

3/23/2010 5:34:00 PM

I know that unicode is the way to go in Python 3.1, but it is getting
in my way right now in my Unix scripts. How do I write a chr(253) to a
file?

#nntst2.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
print(mychar)

> ./nntst2.py
ISO8859-1
ý

> ./nntst2.py >nnout2
Traceback (most recent call last):
File "./nntst2.py", line 5, in <module>
print(mychar)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
position 0: ordinal not in range(128)

> cat nnout2
ascii

...Oh great!

ok lets try this:
#nntst3.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
print(mychar.encode('latin1'))

> ./nntst3.py
ISO8859-1
b'\xfd'

> ./nntst3.py >nnout3

> cat nnout3
ascii
b'\xfd'

...Eh... not what I want really.

#nntst4.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
sys.stdout=codecs.getwriter("latin1")(sys.stdout)
print(mychar)

> ./nntst4.py
ISO8859-1
Traceback (most recent call last):
File "./nntst4.py", line 6, in <module>
print(mychar)
File "Python-3.1.2/Lib/codecs.py", line 356, in write
self.stream.write(data)
TypeError: must be str, not bytes

...OK, this is not working either.

Is there any way to write a value 253 to standard output?
14 Answers

Rami Chowdhury

3/23/2010 6:00:00 PM

0

nn

3/23/2010 6:10:00 PM

0



Rami Chowdhury wrote:
> On Tuesday 23 March 2010 10:33:33 nn wrote:
> > I know that unicode is the way to go in Python 3.1, but it is getting
> > in my way right now in my Unix scripts. How do I write a chr(253) to a
> > file?
> >
> > #nntst2.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar)
>
> The following code works for me:
>
> $ cat nnout5.py
> #!/usr/bin/python3.1
>
> import sys
> mychar = chr(253)
> sys.stdout.write(mychar)
> $ echo $(cat nnout)
> ý
>
> Can I ask why you're using print() in the first place, rather than writing
> directly to a file? Python 3.x, AFAIK, distinguishes between text and binary > files and will let you specify the encoding you want for strings you write.
>
> Hope that helps,
> Rami
> >
> > > ./nntst2.py
> >
> > ISO8859-1
> > ý
> >
> > > ./nntst2.py >nnout2
> >
> > Traceback (most recent call last):
> > File "./nntst2.py", line 5, in <module>
> > print(mychar)
> > UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> > position 0: ordinal not in range(128)
> >
> > > cat nnout2
> >
> > ascii
> >
> > ..Oh great!
> >
> > ok lets try this:
> > #nntst3.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar.encode('latin1'))
> >
> > > ./nntst3.py
> >
> > ISO8859-1
> > b'\xfd'
> >
> > > ./nntst3.py >nnout3
> > >
> > > cat nnout3
> >
> > ascii
> > b'\xfd'
> >
> > ..Eh... not what I want really.
> >
> > #nntst4.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> > print(mychar)
> >
> > > ./nntst4.py
> >
> > ISO8859-1
> > Traceback (most recent call last):
> > File "./nntst4.py", line 6, in <module>
> > print(mychar)
> > File "Python-3.1.2/Lib/codecs.py", line 356, in write
> > self.stream.write(data)
> > TypeError: must be str, not bytes
> >
> > ..OK, this is not working either.
> >
> > Is there any way to write a value 253 to standard output?
>

#nntst5.py
import sys
mychar=chr(253)
sys.stdout.write(mychar)

> ./nntst5.py >nnout5
Traceback (most recent call last):
File "./nntst5.py", line 4, in <module>
sys.stdout.write(mychar)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
position 0: ordinal not in range(128)

equivalent to print.

I use print so I can do tests and debug runs to the screen or pipe it
to some other tool and then configure the production bash script to
write the final output to a file of my choosing.

Gary Herron

3/23/2010 6:11:00 PM

0

nn wrote:
> I know that unicode is the way to go in Python 3.1, but it is getting
> in my way right now in my Unix scripts. How do I write a chr(253) to a
> file?
>

Python3 make a distinction between bytes and string(i.e., unicode)
types, and you are still thinking in the Python2 mode that does *NOT*
make such a distinction. What you appear to want is to write a
particular byte to a file -- so use the bytes type and a file open in
binary mode:

>>> b=bytes([253])
>>> f = open("abc", 'wb')
>>> f.write(b)
1
>>> f.close()

On unix (at least), the "od" program can verify the contents is correct:
> od abc -d
0000000 253
0000001


Hope that helps.

Gary Herron



> #nntst2.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar)
>
> > ./nntst2.py
> ISO8859-1
> ý
>
> > ./nntst2.py >nnout2
> Traceback (most recent call last):
> File "./nntst2.py", line 5, in <module>
> print(mychar)
> UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> position 0: ordinal not in range(128)
>
>
>> cat nnout2
>>
> ascii
>
> ..Oh great!
>
> ok lets try this:
> #nntst3.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar.encode('latin1'))
>
>
>> ./nntst3.py
>>
> ISO8859-1
> b'\xfd'
>
>
>> ./nntst3.py >nnout3
>>
>
>
>> cat nnout3
>>
> ascii
> b'\xfd'
>
> ..Eh... not what I want really.
>
> #nntst4.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> print(mychar)
>
> > ./nntst4.py
> ISO8859-1
> Traceback (most recent call last):
> File "./nntst4.py", line 6, in <module>
> print(mychar)
> File "Python-3.1.2/Lib/codecs.py", line 356, in write
> self.stream.write(data)
> TypeError: must be str, not bytes
>
> ..OK, this is not working either.
>
> Is there any way to write a value 253 to standard output?
>


nn

3/23/2010 6:47:00 PM

0



Gary Herron wrote:
> nn wrote:
> > I know that unicode is the way to go in Python 3.1, but it is getting
> > in my way right now in my Unix scripts. How do I write a chr(253) to a
> > file?
> >
>
> Python3 make a distinction between bytes and string(i.e., unicode)
> types, and you are still thinking in the Python2 mode that does *NOT*
> make such a distinction. What you appear to want is to write a
> particular byte to a file -- so use the bytes type and a file open in
> binary mode:
>
> >>> b=bytes([253])
> >>> f = open("abc", 'wb')
> >>> f.write(b)
> 1
> >>> f.close()
>
> On unix (at least), the "od" program can verify the contents is correct:
> > od abc -d
> 0000000 253
> 0000001
>
>
> Hope that helps.
>
> Gary Herron
>
>
>
> > #nntst2.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar)
> >
> > > ./nntst2.py
> > ISO8859-1
> > ý
> >
> > > ./nntst2.py >nnout2
> > Traceback (most recent call last):
> > File "./nntst2.py", line 5, in <module>
> > print(mychar)
> > UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> > position 0: ordinal not in range(128)
> >
> >
> >> cat nnout2
> >>
> > ascii
> >
> > ..Oh great!
> >
> > ok lets try this:
> > #nntst3.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar.encode('latin1'))
> >
> >
> >> ./nntst3.py
> >>
> > ISO8859-1
> > b'\xfd'
> >
> >
> >> ./nntst3.py >nnout3
> >>
> >
> >
> >> cat nnout3
> >>
> > ascii
> > b'\xfd'
> >
> > ..Eh... not what I want really.
> >
> > #nntst4.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> > print(mychar)
> >
> > > ./nntst4.py
> > ISO8859-1
> > Traceback (most recent call last):
> > File "./nntst4.py", line 6, in <module>
> > print(mychar)
> > File "Python-3.1.2/Lib/codecs.py", line 356, in write
> > self.stream.write(data)
> > TypeError: must be str, not bytes
> >
> > ..OK, this is not working either.
> >
> > Is there any way to write a value 253 to standard output?
> >

Actually what I want is to write a particular byte to standard output,
and I want this to work regardless of where that output gets sent to.
I am aware that I could do
open('nnout','w',encoding='latin1').write(mychar) but I am porting a
python2 program and don't want to rewrite everything that uses that
script.

Stefan Behnel

3/23/2010 7:58:00 PM

0

nn, 23.03.2010 19:46:
> Actually what I want is to write a particular byte to standard output,
> and I want this to work regardless of where that output gets sent to.
> I am aware that I could do
> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> python2 program and don't want to rewrite everything that uses that
> script.

Are you writing text or binary data to stdout?

Stefan

nn

3/23/2010 8:36:00 PM

0



Stefan Behnel wrote:
> nn, 23.03.2010 19:46:
> > Actually what I want is to write a particular byte to standard output,
> > and I want this to work regardless of where that output gets sent to.
> > I am aware that I could do
> > open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> > python2 program and don't want to rewrite everything that uses that
> > script.
>
> Are you writing text or binary data to stdout?
>
> Stefan

latin1 charset text.

Martin v. Loewis

3/23/2010 10:43:00 PM

0

nn wrote:
>
> Stefan Behnel wrote:
>> nn, 23.03.2010 19:46:
>>> Actually what I want is to write a particular byte to standard output,
>>> and I want this to work regardless of where that output gets sent to.
>>> I am aware that I could do
>>> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
>>> python2 program and don't want to rewrite everything that uses that
>>> script.
>> Are you writing text or binary data to stdout?
>>
>> Stefan
>
> latin1 charset text.

Are you sure about that? If you carefully reconsider, could you come to
the conclusion that you are not writing text at all, but binary data?

If it really was text that you write, why do you need to use
U+00FD (LATIN SMALL LETTER Y WITH ACUTE). To my knowledge, that
character is really infrequently used in practice. So that you try to
write it strongly suggests that it is not actually text what you are
writing.

Also, your formulation suggests the same:

"Is there any way to write a value 253 to standard output?"

If you would really be writing text, you'd ask


"Is there any way to write 'ý' to standard output?"

Regards,
Martin

Steven D'Aprano

3/24/2010 4:41:00 AM

0

On Tue, 23 Mar 2010 11:46:33 -0700, nn wrote:

> Actually what I want is to write a particular byte to standard output,
> and I want this to work regardless of where that output gets sent to.

What do you mean "work"?

Do you mean "display a particular glyph" or something else?

In bash:

$ echo -e "\0101" # octal 101 = decimal 65
A
$ echo -e "\0375" # decimal 253
�

but if I change the terminal encoding, I get this:

$ echo -e "\0375"
ý

Or this:

$ echo -e "\0375"
²

depending on which encoding I use.

I think your question is malformed. You need to work out what behaviour
you actually want, before you can ask for help on how to get it.



--
Steven

nn

3/24/2010 1:04:00 PM

0



Martin v. Loewis wrote:
> nn wrote:
> >
> > Stefan Behnel wrote:
> >> nn, 23.03.2010 19:46:
> >>> Actually what I want is to write a particular byte to standard output,
> >>> and I want this to work regardless of where that output gets sent to.
> >>> I am aware that I could do
> >>> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> >>> python2 program and don't want to rewrite everything that uses that
> >>> script.
> >> Are you writing text or binary data to stdout?
> >>
> >> Stefan
> >
> > latin1 charset text.
>
> Are you sure about that? If you carefully reconsider, could you come to
> the conclusion that you are not writing text at all, but binary data?
>
> If it really was text that you write, why do you need to use
> U+00FD (LATIN SMALL LETTER Y WITH ACUTE). To my knowledge, that
> character is really infrequently used in practice. So that you try to
> write it strongly suggests that it is not actually text what you are
> writing.
>
> Also, your formulation suggests the same:
>
> "Is there any way to write a value 253 to standard output?"
>
> If you would really be writing text, you'd ask
>
>
> "Is there any way to write '?' to standard output?"
>
> Regards,
> Martin

To be more informative I am both writing text and binary data
together. That is I am embedding text from another source into stream
that uses non-ascii characters as "control" characters. In Python2 I
was processing it mostly as text containing a few "funny" characters.

nn

3/24/2010 1:08:00 PM

0



Steven D'Aprano wrote:
> On Tue, 23 Mar 2010 11:46:33 -0700, nn wrote:
>
> > Actually what I want is to write a particular byte to standard output,
> > and I want this to work regardless of where that output gets sent to.
>
> What do you mean "work"?
>
> Do you mean "display a particular glyph" or something else?
>
> In bash:
>
> $ echo -e "\0101" # octal 101 = decimal 65
> A
> $ echo -e "\0375" # decimal 253
> ?
>
> but if I change the terminal encoding, I get this:
>
> $ echo -e "\0375"
> ý
>
> Or this:
>
> $ echo -e "\0375"
> ²
>
> depending on which encoding I use.
>
> I think your question is malformed. You need to work out what behaviour
> you actually want, before you can ask for help on how to get it.
>
>
>
> --
> Steven

Yes sorry it is a bit ambiguous. I don't really care what glyph is,
the program reading my output reads 8 bit values expects the binary
value 0xFD as control character and lets everything else through as is.