[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[ANN] Text::Hyphen 1.0.0

Austin Ziegler

12/21/2004 3:41:00 AM

I just told you that I'm releasing Text::Hyphen 1.0.0, and here it is
to prove it.

Text::Hyphen README
===================
Text::Hyphen will properly hyphenate various words according to the rules of
the language the word is written in. The algorithm is based on that of the TeX
typesetting system by Donald E. Knuth. This is originally based on the Perl
implementation of TeX::Hyphen[1] and the Ruby port TeX::Hyphen[2]. The
language hyphenation pattern files are based on the sources available from
CTAN[3] as of 2004.12.19 and have been translated by Austin Ziegler.

This release is 1.0, the initial release of Text::Hyphen, representing a
significant improvement over its predecessor, TeX::Hyphen.

require 'text/hyphen'
hh = Text::Hyphen.new(:language => 'en_us', :left => 2, :right => 2)
# Defaults to the above
hh = TeX::Hyphen.new

word = "representation"
points = hyp.hyphenate(word) #=> [3, 5, 8, 10]
puts hyp.visualize(word) #=> rep-re-sen-ta-tion

Text::Hyphen is truly multilingual in nature[4]. As an example, consider the
difference between the following:

require 'text/hyphen'
# Using left and right minimum values of 0 ensures that you will see all
# possible hyphenation points, not just those that meet the minimum
# width requirements.
en = Text::Hyphen.new(:left => 0, :right => 0)
fr = Text::Hyphen.new(:language = "fr", :left => 0, :right => 0)

puts en.visualise("organiser") #=> or-gan-iser
puts fr.visualise("organiser") #=> or-ga-ni-ser

As you can see, the hyphenation is distinct between the two hyphenators.
Additional improvements over TeX::Hyphen include thread safety (except for
debug control) and support for UTF-8.

It is very important to read the LICENCE file and each language file desired,
as some languages may be held under a more strict licence than that granted by
LICENCE.

Copyright
=========
# Copyright 2004 Austin Ziegler <text-hyphen@halostatue.ca>
# See the LICENCE file for more information.

[1] <http://search.cpan.org/author/JANPAZ/TeX-Hyphen-0.140/lib/TeX/Hyp...
Maintained by Jan Pazdziora.
[2] Available at <http://rubyforge.org/projects/text-....
[3] <http://www.ct...
[4] There are some bugs and design decisions in the original Perl
implementation of TeX::Hyphen that make it unsuitable for most
multilingual implementations that carried over to the Ruby port of
TeX::Hyphen.
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca


13 Answers

Florian Gross

12/21/2004 4:11:00 PM

0

Austin Ziegler wrote:

> I just told you that I'm releasing Text::Hyphen 1.0.0, and here it is
> to prove it.

Can it rewrap text to fit into lines of X characters?

Austin Ziegler

12/21/2004 4:26:00 PM

0

On Wed, 22 Dec 2004 01:12:03 +0900, Florian Gross <flgr@ccan.de> wrote:
> Austin Ziegler wrote:
> > I just told you that I'm releasing Text::Hyphen 1.0.0, and here it is
> > to prove it.
> Can it rewrap text to fit into lines of X characters?

No, that's Text::Format, which I'm starting work on the next update
tonight for release later this week. Text::Format will use
Text::Hyphen or TeX::Hyphen to hyphenate words as it wraps them.

Text::Hyphen (and TeX::Hyphen) only hyphenate individual words.

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca


Florian Gross

12/21/2004 5:02:00 PM

0

Austin Ziegler wrote:

>>>I just told you that I'm releasing Text::Hyphen 1.0.0, and here it is
>>>to prove it.
>>
>>Can it rewrap text to fit into lines of X characters?
>
> No, that's Text::Format, which I'm starting work on the next update
> tonight for release later this week. Text::Format will use
> Text::Hyphen or TeX::Hyphen to hyphenate words as it wraps them.

Nice, can't wait for that library.

Austin Ziegler

12/21/2004 6:53:00 PM

0

On Wed, 22 Dec 2004 02:07:02 +0900, Florian Gross <flgr@ccan.de> wrote:
> Austin Ziegler wrote:
> >>>I just told you that I'm releasing Text::Hyphen 1.0.0, and here it is
> >>>to prove it.
> >>
> >>Can it rewrap text to fit into lines of X characters?
> > No, that's Text::Format, which I'm starting work on the next update
> > tonight for release later this week. Text::Format will use
> > Text::Hyphen or TeX::Hyphen to hyphenate words as it wraps them.
> Nice, can't wait for that library.

It's actually already available, but not from RubyForge, yet. Check
the RAA entry for text-format.

The update will fix a couple of reported bugs and add a couple of
requested features. It will also change the initialization mechanism
-- so be careful (but the version number will be 1.0 instead of the
current 0.64). A future update will look at integrating most of the
features from par and perhaps other features.

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca


Kaspar Schiess

12/22/2004 3:41:00 PM

0

Austin Ziegler wrote (news:9e7db91104122108267c480511@mail.gmail.com):

> No, that's Text::Format, which I'm starting work on the next update
> tonight for release later this week.

Hey Austin,

I have ported the Text::Reform perl library into Ruby, using also
TeX::Hyphen for its hyphenation. Its still unreleased, but if you want to
we could make a kind of Xmas joint release by the end of the year ?

I guess I could even include a 'break_hyphen' (using Text::Hyphenate) in
addition to 'break_TeX' (using TeX::Hyphenate).

I am also working on Autoformat, although that has not progressed as far.
Is there a Rubyforge site that specialises in Perl ports ?

yours, kaspar

hand manufactured code - www.tua.ch/ruby



Austin Ziegler

12/22/2004 3:55:00 PM

0

On Thu, 23 Dec 2004 00:40:45 +0900, Kaspar Schiess <eule@space.ch>
wrote:
> Austin Ziegler wrote
> (news:9e7db91104122108267c480511@mail.gmail.com):
>> No, that's Text::Format, which I'm starting work on the next
>> update tonight for release later this week.
> I have ported the Text::Reform perl library into Ruby, using also
> TeX::Hyphen for its hyphenation. Its still unreleased, but if you
> want to we could make a kind of Xmas joint release by the end of
> the year?

Well, the bad news is that I'm going on a short vacation on 26th
December and am very busy until then. If you want to send me what
you have to look over, I'll be happy to do so. Also, I will be happy
to host any text formatting related projects within the Text
Formatting project on RubyForge (http://rubyforge.org/t...).

> I guess I could even include a 'break_hyphen' (using
> Text::Hyphenate) in addition to 'break_TeX' (using
> TeX::Hyphenate).

Actually, both TeX::Hyphen and Text::Hyphenate respond to a very
useful method, #hyphenate_to, which will hyphenate a word to a
certain number of characters *including the hyphen*. This is the
single mechanism by which the Text::Format library can use the
hyphenator as a plugin. Text::Hyphenate uses the same rules as
TeX::Hyphen -- it just uses different initialization and rule
specification.

TeX::Hyphen -- the original Perl version, not the port initially
done by Martin DeMello -- has a number of glaring problems that as I
looked at the situation would have ended up in requiring the
reimplementation of a not-small portion of the TeX code in Ruby --
and still not have been compatible with non-TeX text encodings
(e.g., ISO-8859-x and/or UTF-8 or anything else). Thus, I spent a
not-inconsiderable amount of time actually going through the TeX
font encoding documentation and comparing it against UTF-8 and
ISO-8859-x to convert T1's \355 to whatever its equivalent would be
in UTF-8, etc. This doesn't affect English formatting, but I support
Polish and several other languages that include characters like Å¡
and č and ű that just don't show up in latin1/ISO-8859-1[5].

> I am also working on Autoformat, although that has not progressed
> as far. Is there a Rubyforge site that specialises in Perl ports ?

Martin DeMello has also expressed interest in porting 'par' to Ruby.

Would you like to join Text Formatting as a developer?

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca



Kaspar Schiess

12/22/2004 4:11:00 PM

0

Hello Austin,

> If you want to send me what
> you have to look over, I'll be happy to do so.

I really would like that, have my library peer-reviewed by you. Just want
to remind you of two things before I send you the code:

a) I was not aiming at a clean Ruby version, but porting the
functionality. Perl is sometimes a little odd to express in Ruby, and my
version of Text::Reform still has that oddness. It'll go away in future
versions.

b) I am essentially working on documentation (which is excellent in Perl
stuff) and on unit tests. That work has to be complete before the first
release.

If your still interested in looking trough, yes, I would like that very
much.

> Actually, both TeX::Hyphen and Text::Hyphenate respond to a very
> useful method, #hyphenate_to, which will hyphenate a word to a
> certain number of characters *including the hyphen*. This is the
> single mechanism by which the Text::Format library can use the
> hyphenator as a plugin.

This will eventually be that way. But right now, I am pursuing my own
little hyphenation-abstraction-layer, which asks you to supply a class
that answers to '#break'. Plus, the Text::Reform class features a number
of factory methods for such classes, like the Perl version does. It has
#break_wrap, #break_at, #break_TeX and (will have) #break_hyphenate. This
is part of the things that I will adapt to be more Ruby-like, perhaps by
allowing to pass a block that does hyphenation.

>> I am also working on Autoformat, although that has not progressed
>> as far. Is there a Rubyforge site that specialises in Perl ports ?
>
> Martin DeMello has also expressed interest in porting 'par' to Ruby.

Text::Autoformat is fully functional, its just that I don't have
documentation and unit tests for it, so I won't be releasing it. Tell me
if you want to have a look at that work in progress.

> Would you like to join Text Formatting as a developer?
I guess its a good idea to not stick to the idea of collecting those
projects as 'perl ports projects', but rather thematically group them
with respect to purpose. Yes, I think these two projects would be
coherent with the purpose of your text formatting rubyforge project, so
count me in. Glad to have found a way to release, I was getting to that
problem in a few days ;).

yours, kaspar



Bil Kleb

12/25/2004 1:07:00 PM

0

Austin Ziegler wrote:
>
> Martin DeMello has also expressed interest in porting 'par' to Ruby.
>
> Would you like to join Text Formatting as a developer?

Being a technical typesetting hobbyist this whole discussion is
rather intriguing. Where can I read more about where all these
bits are heading?

--
Bil Kleb, Hampton, Virginia
http://fun3d.lar...

Austin Ziegler

12/25/2004 3:29:00 PM

0

On Sat, 25 Dec 2004 22:11:55 +0900, Bil Kleb <Bil.Kleb@nasa.gov> wrote:
> Austin Ziegler wrote:
> > Martin DeMello has also expressed interest in porting 'par' to Ruby.

> > Would you like to join Text Formatting as a developer?
> Being a technical typesetting hobbyist this whole discussion is
> rather intriguing. Where can I read more about where all these
> bits are heading?

Right at the moment? ...

Nowhere.

In the future?

Somewhere; probably on the Text Formatting page. One of the things
that I'd like to do is take Text::Formatting and Text::Hyphen and be
able to plug them into PDF::Writer -- using proportional character
sizes instead of fixed character sizes.

Because I am having to recover from a hard drive crash, it will
obviously not be possible for me to deliver Text::Formatting 1.0
before I leave on a week's vacation to the beaches of Cuba. Look for
it sometime in the new year.

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca


Kaspar Schiess

12/29/2004 8:56:00 AM

0

(In response to news:cqjom0$mfg$1@news2.news.larc.nasa.gov by Bil Kleb)

> Being a technical typesetting hobbyist this whole discussion is
> rather intriguing. Where can I read more about where all these
> bits are heading?

I guess we could give you a good prediction about future package
availability if you tell us what you are looking for. I plan to port some
packages myself, but I guess we're still far behind the Pe.l community in
this area... The upside being that we can port instead of having to code
ourselves.

I got to be a typesetting hobbyist after finally getting around to
understanding some of the LaTeX system.. Where does your interest come from
? Is it just the 'Oh-No-Word-Messed-Up-My-Text' horror ?

kaspar

hand manufactured code - www.tua.ch/ruby