Asp Forum - Re: String#upcase/downcase with UTF-8 strings in Ruby 1.9

Yukihiro Matsumoto

7/9/2008 11:25:00 PM

Hi,

In message "Re: String#upcase/downcase with UTF-8 strings in Ruby 1.9"
on Thu, 10 Jul 2008 07:09:29 +0900, "Stefan Schmidt" <Stefan.Schmidt@gm=
x.net> writes:

|in Ruby 1.9 I get the following behaviour:
|
|>> "aoue=E4=F6=FC=E9".upcase
|=3D> "AOUE=E4=F6=FC=E9"
|>> "AOUE=C4=D6=DC=C9".downcase
|=3D> "aoue=C4=D6=DC=C9"
|
|I can't find however find a bug in the bug tracking system.
|Doesn't this qualify as a bug?

The document for String#upcase says:

call-seq:
str.upcase =3D> new_str
=20
Returns a copy of <i>str</i> with all lowercase letters replaced with the=
ir
uppercase counterparts. The operation is locale insensitive---only
characters ``a'' to ``z'' are affected.
Note: case replacement is effective only in ASCII region.
=20
"hEllO".upcase #=3D> "HELLO"

See "Note:". Tim Bray have persuaded me to do so, since case
conversion outside of ASCII region is highly dependent on country,
language, culture and script.

matz.

11 Answers

John Joyce

7/10/2008 12:17:00 AM

On Jul 9, 2008, at 6:25 PM, Yukihiro Matsumoto wrote:

> Hi,
>
> In message "Re: String#upcase/downcase with UTF-8 strings in Ruby 1.9"
> on Thu, 10 Jul 2008 07:09:29 +0900, "Stefan Schmidt" =
<Stefan.Schmidt@gmx.net=20
> > writes:
>
> |in Ruby 1.9 I get the following behaviour:
> |
> |>> "aoue=E4=F6=FC=E9".upcase
> |=3D> "AOUE=E4=F6=FC=E9"
> |>> "AOUE=C4=D6=DC=C9".downcase
> |=3D> "aoue=C4=D6=DC=C9"
> |
> |I can't find however find a bug in the bug tracking system.
> |Doesn't this qualify as a bug?
>
> The document for String#upcase says:
>
> call-seq:
> str.upcase =3D> new_str
>
> Returns a copy of <i>str</i> with all lowercase letters replaced =20
> with their
> uppercase counterparts. The operation is locale insensitive---only
> characters ``a'' to ``z'' are affected.
> Note: case replacement is effective only in ASCII region.
>
> "hEllO".upcase #=3D> "HELLO"
>
> See "Note:". Tim Bray have persuaded me to do so, since case
> conversion outside of ASCII region is highly dependent on country,
> language, culture and script.
>
> matz.
>
This leaves the perfect opening for people to contribute locale or =20
language specific extensions to String.
It would make a great gem with a plug-in architecture.
Just add options for the language you want to use.
In any case it can get very tricky to do character conversions with =20
different languages.

Stefan Schmidt

7/10/2008 1:17:00 AM

> The document for String#upcase says:

Yes, sorry, I should have read the documentation

> See "Note:". Tim Bray have persuaded me to do so, since case
> conversion outside of ASCII region is highly dependent on country,
> language, culture and script.

So basically the Python guys are going down a wrong route ?

# -*- coding: utf-8 -*-
import string
print string.upper(u"aoueäöüé")
print string.lower(u"AOUEÄÖÜÉ")

works as expected.

Cheers, Stefan

John Joyce

7/10/2008 1:25:00 AM

On Jul 9, 2008, at 8:17 PM, Stefan Schmidt wrote:

>> The document for String#upcase says:
>
> Yes, sorry, I should have read the documentation
>
>> See "Note:". Tim Bray have persuaded me to do so, since case
>> conversion outside of ASCII region is highly dependent on country,
>> language, culture and script.
>
> So basically the Python guys are going down a wrong route ?
>
> # -*- coding: utf-8 -*-
> import string
> print string.upper(u"aoue=E4=F6=FC=E9")
> print string.lower(u"AOUE=C4=D6=DC=C9")
>
> works as expected.
>
> Cheers, Stefan
>
No.
They're going down a different route.
Seriously, the language handling is something that could easily be =20
handled by extensions. It does not need to be a core part of the =20
language.
Even operating systems handle these things with proprietary and very =20
sophisticated techniques based on the language in question.
In most cases, what you are expecting to be the correct upper case =20
characters may be 'correct' but it will ultimately depend on the =20
language and the context.=

Stefan Schmidt

7/10/2008 3:39:00 PM

> > So basically the Python guys are going down a wrong route ?
> >
> > # -*- coding: utf-8 -*-
> > import string
> > print string.upper(u"aoueäöüé")
> > print string.lower(u"AOUEÄÖÜÉ")
> >
> > works as expected.
> >
> > Cheers, Stefan
> >
> No.
> They're going down a different route.
> Seriously, the language handling is something that could easily be
> handled by extensions. It does not need to be a core part of the
> language.

Is Nikolai Weibull's Ruby Character Encodings Library [1] currently the best way to go?

Stefan

[1] http://bitwi.se/software/ruby/character-...

Stefan Schmidt

7/11/2008 5:30:00 AM

> Seriously, the language handling is something that could easily be
> handled by extensions. It does not need to be a core part of the
> language.

Are there any working extensions for Ruby 1.9 that offer Unicode support for String#downcase/upcase and/or Array#sort?

Stefan

RichTravsky

11/15/2009 9:02:00 PM

smorgas@board.com wrote:
> On Wed, 11 Nov 2009 20:30:01 -0500, "CB"
> <CB@PrayForMe.com> wrote:
>
> >I pray that Conservatism continues momentum so that Liberals up for
> >reelection are kicked out on their collective ass. I also pray for the
> >President to remain safe from harm and for his policies to fail.
>
> Then you're Praying for the bad guys, Barta

CB is unAmerican.

> it was NOT liberals in the Nazi regime or the Communist
> regimes that caused the problem
>
> YOUR kind of politician did
>
> Same as in the old south
>
> CONSERVATIVES were racists, homphobes, bigots and
> lynched blacks

RichTravsky

11/15/2009 9:05:00 PM

Clairbear wrote:
> smorgas@board.com wrote in news:ts2uf5pgj8guet6tqupt3e8d9ckod89a9p@4ax.com:
> > On Wed, 11 Nov 2009 20:30:01 -0500, "CB"
> > <CB@PrayForMe.com> wrote:
> >
> >>I pray that Conservatism continues momentum so that Liberals up for
> >>reelection are kicked out on their collective ass. I also pray for the
> >>President to remain safe from harm and for his policies to fail.
> >
> > Then you're Praying for the bad guys, Barta
> >
> > it was NOT liberals in the Nazi regime or the Communist
> > regimes that caused the problem
> >
> > YOUR kind of politician did
> >
> > Same as in the old south
> >
> > CONSERVATIVES were racists, homphobes, bigots and
> > lynched blacks
> >
> Racist, bigots and lynchers like former KKK grand dragon, Robert Byrd(D-
> WV)?
> Got news for he is not a conservative. Hate cuts across party lines and
> political divides.
> "None are so blind, as those who cannot see" Enjoy your darkness

List all the black republicons now in Congress ->

List all the black republicon Presidents ->

https://www.entrepreneur.com/tradejournals/article/118...
...
In early March 1989, after his legislative victory, Duke
addressed a Populist party convention in Chicago, telling
the audience of neo-Nazis, white supremacists, and skinheads
that he had run for office under the GOP label "because that's
where so many of our people are," adding, "I am a Republican,
but I am and always will be a Populist Republican!" Unbeknownst
to Duke, an opponent tape-recorded his remarks and later offered
the story to the Picayune.
...

Enjoy your darkness, Carebear.

RichTravsky

11/15/2009 9:06:00 PM

Beam Me Up Scotty wrote:
>
> smorgas@board.com wrote:
> > On Wed, 11 Nov 2009 20:30:01 -0500, "CB"
> > <CB@PrayForMe.com> wrote:
> >
> >> I pray that Conservatism continues momentum so that Liberals up for
> >> reelection are kicked out on their collective ass. I also pray for the
> >> President to remain safe from harm and for his policies to fail.
> >
> > Then you're Praying for the bad guys, Barta
> >
> > it was NOT liberals in the Nazi regime or the Communist
> > regimes that caused the problem
>
> They were the catalyst to give up individual rights that make Nazi and
> Communist able to take power.

Oh? Cite ->

Clairbear

11/15/2009 9:13:00 PM

"CB" <CB@PrayForMe.com> wrote in
news:4b001925$0$4971$9a6e19ea@unlimited.newshosting.com:

>
> "Clairbear" <clair@Verizon.net> wrote in message
> news:Xns9CC3E0B8D382Aclairbear@198.186.192.136...
>> smorgas@board.com wrote in
>> news:e7euf555mvb2mfr2djq0japjie8r6pjj4g@4ax.com:
>>
>>> On Sat, 14 Nov 2009 15:28:42 -0500, Beam Me Up Scotty
>>> <Then-Destroy-Everything@Talk-n-dog.com> wrote:
>>>
>>>>smorgas@board.com wrote:
>>>>> On Wed, 11 Nov 2009 20:30:01 -0500, "CB"
>>>>> <CB@PrayForMe.com> wrote:
>>>>>
>>>>>> I pray that Conservatism continues momentum so that Liberals up
>>>>>> for reelection are kicked out on their collective ass. I also
>>>>>> pray for the President to remain safe from harm and for his
>>>>>> policies to fail.
>>>>>
>>>>> Then you're Praying for the bad guys, Barta
>>>>>
>>>>> it was NOT liberals in the Nazi regime or the Communist
>>>>> regimes that caused the problem
>>>>
>>>>They were the catalyst to give up individual rights that make Nazi
>>>>and Communist able to take power.
>>>
>>> Liberals and Progressives set up our "Foundation" you
>>> nutjob
>>>
>>> Conservatives sided with the crown----wanting "no
>>> change" or "conserving the status quo"
>>>
>>> Conservatives in the south fought to preserve the
>>> "status quo" and the "right" of state to make laws
>>> okaying slavery, Jim Crow and forcing religous quackery
>>> on citizens.
>>>
>>> It was the the common behavior of Conservatives in
>>> Germany who fought to maintain the "stature" of the
>>> former German Empires---just like the conservatives in
>>> the south did to maintain that corrupt antebellum
>>> social order
>>>
>>> It is conservatism that championed fights against ALL
>>> change
>>>
>> The liberal elite double talk has addled you tiny lttle mind
>> I won't accuse you of lying as you obviously lack the abilty to tell
>> lie from truth or right from wrong. All you see is your hate filled
>> delusion
>>
>
> Down is up and up be down...Gary likes it up and down, it's chaos,
> it's the void of reason
>
>
I guess his hatereds make that way hate twisted his mind and soul I
noticed he is getting more abusive when he is prove to be wrong.

Clairbear

11/15/2009 9:28:00 PM

smorgas@board.com wrote in
news:3h70g5hv21dlvnh82f2170re1cftb6334u@4ax.com:

> On Sun, 15 Nov 2009 10:07:17 -0500, "CB"
> <CB@PrayForMe.com> wrote:
>
>>
>>Down is up and up be down...Gary likes it up and down, it's chaos,
>>it's the void of reason
>
> Just so with You Barta
>
> You refer to "democrats" as the racists and
> slavers---when in FACT those Democrats were
> CONSERVATIVES
>
> They were then, and are the present Republican
> party--with "states rights" plank which justified
> opposition to integration.
>
> When LIBERAL democrats adopted the Civil Rights
> legislation as their platform----Southern CONSERVATIVES
> (democrats Like Thurmond, Helms, Faircloth) all became
> republicans
>
> So party doesn't matter---it's the IDEOLOGY that they
> believe
>
> The Present day Republicans now are the former southern
> conservatives.
>
>
So through your perverted logic you see racism and hatered as purely a
conservative thing all the while ignoring how the left has exploited
minorities to gain political power. Liberals have a vested interestr in
minories being "victims" as a matter of policy. There are conservatives of
all races and faiths just as there are liberals.
Despite your saying party does not matter liberals will support a dem no
matter his ideology. Consevatives general vote republican though some will
vote dem if ther candidate is more to their liking.Moderates and the wishy-
washies tend to move like wheat in the wind and more Americans are moderate
than left or right. the moderates are often the sheep of politics gonig in
one direction then the other and with the way the Pelosi and Obama are
going it may not belong till the move back towards repulicans again. If
they saw the way you and many of the left act they might steer clear of you
permanently.
Foul mouth abusive punks like you make all those on the left look like a
bunches of loons.

Lies, misrepresentations, exagerations, abuse and foul language are all you
seem to have going for you and that is both pathetic and sad that American
education turns out such individuals.
BUH BYE

comp.lang.ruby

Re: String#upcase/downcase with UTF-8 strings in Ruby 1.9

Yukihiro Matsumoto

John Joyce

Stefan Schmidt

John Joyce

Stefan Schmidt

Stefan Schmidt

RichTravsky

RichTravsky

RichTravsky

Clairbear

Clairbear

x Login to ForumsZone