[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

UTF-8 and printf

Jose

12/5/2007 10:20:00 AM

Hello,

I'm trying to use printf to give a tabulated format to my output, like
for example:

printf "%20s %10s %10s", title, author, date

being title, author and date string variables. The text contained in
these variables is uft-8 encoded, and this makes printf to misalign
the outupt. The reason is that one multi-byte char (for example, a two-
byte char) is counted as several chars (two chars), and thus the
number of spaces required for padding is wrongly calculated.

I searched for discussions about ruby and utf8, and in general it does
not appear as an easy issue. I read abou the String#char proxy
introduced by rails, but I'm not using rails, and in addition I think
it would be of no help here.

Do you know any solution to my problem? The use of printf it is not a
requisite, all what I want if to align the output in columns, without
using "\t"

Thanks in advance,
--Jose
5 Answers

MonkeeSage

12/5/2007 1:58:00 PM

0

On Dec 5, 4:19 am, Jose <jld...@gmail.com> wrote:
> Hello,
>
> I'm trying to use printf to give a tabulated format to my output, like
> for example:
>
> printf "%20s %10s %10s", title, author, date
>
> being title, author and date string variables. The text contained in
> these variables is uft-8 encoded, and this makes printf to misalign
> the outupt. The reason is that one multi-byte char (for example, a two-
> byte char) is counted as several chars (two chars), and thus the
> number of spaces required for padding is wrongly calculated.
>
> I searched for discussions about ruby and utf8, and in general it does
> not appear as an easy issue. I read abou the String#char proxy
> introduced by rails, but I'm not using rails, and in addition I think
> it would be of no help here.
>
> Do you know any solution to my problem? The use of printf it is not a
> requisite, all what I want if to align the output in columns, without
> using "\t"
>
> Thanks in advance,
> --Jose

Unfortunately, I think you'll have to use something ugly like this...

def pad(n, s)
(" " * (n - s.unpack("U*").length)) + s
end

def padded(*elems)
out = []
for elem in elems
out << pad(elem[0], elem[1])
end
out.join(" ")
end

puts padded([20, title], [10, author], [10, date])

Regards,
Jordan

Jose

12/7/2007 12:55:00 AM

0

On 5 dic, 14:58, MonkeeSage <MonkeeS...@gmail.com> wrote:
> Unfortunately, I think you'll have to use something ugly like this...
>
> def pad(n, s)
> (" " * (n - s.unpack("U*").length)) + s
> end
> [...]

Hey, thank you very much. The trick of unpack to find the string
length is a nice one. And s.unpack("U*").length is only 4 times slower
than s.length, according to my benchmarks.

Anybody knows this printf "bug" will be solved in ruby 1.9?

Regards,
--Jose

MonkeeSage

12/7/2007 1:07:00 AM

0

On Dec 6, 6:54 pm, Jose <jld...@gmail.com> wrote:
> On 5 dic, 14:58, MonkeeSage <MonkeeS...@gmail.com> wrote:
>
> > Unfortunately, I think you'll have to use something ugly like this...
>
> > def pad(n, s)
> > (" " * (n - s.unpack("U*").length)) + s
> > end
> > [...]
>
> Hey, thank you very much. The trick of unpack to find the string
> length is a nice one. And s.unpack("U*").length is only 4 times slower
> than s.length, according to my benchmarks.
>
> Anybody knows this printf "bug" will be solved in ruby 1.9?
>
> Regards,
> --Jose

Not really a bug, just that 1.8 doesn't have native unicode support.
But, yes, in ruby 1.9 you have a native utf-8 type, so with default
utf-8 encoding, printf Just Works (you can also force utf-8 encoding
with String#force_encoding if you're using a different native
encoding, and printf does the right thing). :)

Regards,
Jordan

Jose

12/7/2007 1:11:00 AM

0

On 7 dic, 02:06, MonkeeSage <MonkeeS...@gmail.com> wrote:

> > Anybody knows this printf "bug" will be solved in ruby 1.9?
>
> Not really a bug, just that 1.8 doesn't have native unicode support.

I understand. That's why I put "bug" in double quotes

> But, yes, in ruby 1.9 you have a native utf-8 type, so with default
> utf-8 encoding, printf Just Works


Great!

I'm still intrigued about the poor utf8 support in current and past
versions, specially taking into account that ruby was developed in
Japan. Anyway, these are good news.

Thanks for answering,
--Jose

MonkeeSage

12/7/2007 7:42:00 AM

0

On Dec 6, 7:11 pm, Jose <jld...@gmail.com> wrote:
> On 7 dic, 02:06, MonkeeSage <MonkeeS...@gmail.com> wrote:
>
> > > Anybody knows this printf "bug" will be solved in ruby 1.9?
>
> > Not really a bug, just that 1.8 doesn't have native unicode support.
>
> I understand. That's why I put "bug" in double quotes
>
> > But, yes, in ruby 1.9 you have a native utf-8 type, so with default
> > utf-8 encoding, printf Just Works
>
> Great!
>
> I'm still intrigued about the poor utf8 support in current and past
> versions, specially taking into account that ruby was developed in
> Japan. Anyway, these are good news.

IIRC, ruby wasn't created with unicode support because unicode is less
efficient at representing East Asian character sets than other
encodings like shift-jis/euc-jp (something to the effect of unicode
requiring 16-bits to store characters that can be represented in 8-
bits in those other encodings).

> Thanks for answering,
> --Jose

No problem. :)

Regards,
Jordan