[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

PDF::Writer and Unicode

Xavier Noria

2/16/2007 11:07:00 AM

According to the current manual PDF documents generated by
PDF::Writer can use UTF-16BE, but after a few trials with iconv I
can't get my UTF-8 strings right. Example:

$KCODE = 'u'

require 'rubygems'
require 'pdf/writer'
require 'iconv'

str = Iconv.iconv('UTF-16BE', 'UTF-8', 'á ß €')
pdf = PDF::Writer.new

# renders á and ß right, but not €
pdf.text str

# same output with garbage prepended
pdf.text "\xfe\xff#{str}"
pdf.save_as('unicode_test.pdf')

The manual does not document if any encoding is needed for
select_font, I've played around with variations of

# gives complete garbage
pdf.select_font 'Times-Roman', :encoding => 'UTF-16BE'

without luck.

TextMate is generating UTF-8 source files for sure. Any ideas?

-- fxn


6 Answers

Vincent Fourmond

2/16/2007 11:59:00 AM

0

Xavier Noria wrote:
> The manual does not document if any encoding is needed for select_font,
> I've played around with variations of
>
> # gives complete garbage
> pdf.select_font 'Times-Roman', :encoding => 'UTF-16BE'
>
> without luck.

I'm not familiar with PDF::Writer, but I would be surprised if you
really had all the glyphs for 'UTF-16BE' by default. What is the exact
output ? Does it produce the PDF file, or it simply fails with an
exception, or crashes ?

If a PDF file is produced (of reasonable size), would you mind posting
it ?

Cheers,

Vince


--
Vincent Fourmond, PhD student (not for long anymore)
http://vincent.fourmon...

Xavier Noria

2/16/2007 1:21:00 PM

0



Vincent Fourmond

2/16/2007 1:50:00 PM

0

Xavier Noria wrote:
> On Feb 16, 2007, at 12:59 PM, Vincent Fourmond wrote:
>
>> Xavier Noria wrote:
>>> The manual does not document if any encoding is needed for select_font,
>>> I've played around with variations of
>>>
>>> # gives complete garbage
>>> pdf.select_font 'Times-Roman', :encoding => 'UTF-16BE'
>>>
>>> without luck.
>>
>> I'm not familiar with PDF::Writer, but I would be surprised if you
>> really had all the glyphs for 'UTF-16BE' by default. What is the exact
>> output ? Does it produce the PDF file, or it simply fails with an
>> exception, or crashes ?
>>
>> If a PDF file is produced (of reasonable size), would you mind posting
>> it ?
>
> Sure, it's just 4KB. This is the PDF generated by
>
> $KCODE = 'u'
>
> require 'rubygems'
> require 'pdf/writer'
> require 'iconv'
>
> str = Iconv.iconv('UTF-16BE', 'UTF-8', 'á ß ?')
> pdf = PDF::Writer.new
> pdf.text str
> pdf.text "\xfe\xff#{str}"
> pdf.save_as('unicode_test.pdf')
>
> As you see, the glyph we get wrong in this small test is the euro
> symbol. This is important to me because not only my database in in UTF-8
> coming from an unrestricted UTF-8 frontend (website), but the
> application has money here and there and needs to be able to output that
> currency symbol.

Actually, what you see on the screen is the latin1 representation of
your UTF-16BE string (see below). ^@ means chr 0 and seem to be ignored
by the PDF viewers, and UTF-16BE has the good taste to map to latin1 for
values up to 255. See what less unicode_test.pdf is giving me (I'm on a
latin1 locale):

BT 36.000 744.440 Td /F1 10.0 Tf 0 Tr (^@á^@ ^@ß^@ ¬) Tj ET
BT 36.000 732.880 Td /F1 10.0 Tf 0 Tr (þÿ^@á^@ ^@ß^@ ¬) Tj ET

Moreover, in this particular case, you are using the Helvetica
built-in font, and I'm pretty sure it doesn't have glyphes for a Euro
symbol. Finally, acroread says that the encoding of the font is 'ansi'.
That is definitely not what you want. Keep in mind that most of the
fonts (about everywhere) are defined for a small encoding (ansi/latin1,
or other 8bits encodings). I unfortunately don't think I can help you
further. If you don't rely too much yet on PDF::Writer, you could use
pdfLaTeX as an alternative, although PDF produced will be significantly
bigger (for small files)...

Welcome to the nightmare world of fonts and encodings...

Vince

--
Vincent Fourmond, PhD student (not for long anymore)
http://vincent.fourmon...

Austin Ziegler

2/16/2007 3:40:00 PM

0

On 2/16/07, Xavier Noria <fxn@hashref.com> wrote:
> According to the current manual PDF documents generated by
> PDF::Writer can use UTF-16BE, but after a few trials with iconv I
> can't get my UTF-8 strings right. Example:

The manual is incorrect; I have recently figured out how to write
UTF-16 strings, but the current PDF::Writer doesn't do this (and there
are issues that I need to resolve before this will even show up in any
release of PDF::Writer).

-austin
--
Austin Ziegler * halostatue@gmail.com * http://www.halo...
* austin@halostatue.ca * http://www.halo...feed/
* austin@zieglers.ca

Simon Kröger

2/16/2007 5:56:00 PM

0

Vincent Fourmond wrote:

> Welcome to the nightmare world of fonts and encodings...

.... and PDF generation in Ruby.

If this helps, you can see myself struggle with the same
problem here:

http://groups.google.de/group/comp.lang.ruby/browse_thread/thread/54336c6a932903fe/f0bb48...

I ended up using libharu (http://libharu.source...)

It is cross platform, FAST and has ruby bindings (it is a little bit
clumsy to use and the ruby bindings are missing some functions but
it is the best i could find)

example:
-----------------------------------------------------------------------
require "hpdf"

pdf = HPDFDoc.new
font = pdf.get_font("Helvetica", "CP1254")

page = pdf.add_page

page.set_size(HPDFDoc::HPDF_PAGE_SIZE_A4, HPDFDoc::HPDF_PAGE_PORTRAIT)
page.set_font_and_size(font, 96)

page.begin_text

page.move_text_pos(100, 700)
page.show_text("\x80")

page.end_text

pdf.save_to_file "c:/temp/test.pdf"
-----------------------------------------------------------------------

With a little love to the wrapper this could be really good...

cheers

Simon



Xavier Noria

2/17/2007 10:01:00 AM

0

On Feb 16, 2007, at 2:49 PM, Vincent Fourmond wrote:

> Moreover, in this particular case, you are using the Helvetica
> built-in font, and I'm pretty sure it doesn't have glyphes for a Euro
> symbol.

Austin explained the issue. But to understand that remark in any
case, is that Helvetica in the PDF different from the Helvetica I use
in the system? The Helvetica here in the Mac certainly has the euro
symbol.

-- fxn