[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re curly quotes and forcing ascci

Ezra Zygmuntowicz

1/14/2006 12:59:00 AM

Listers-

I'm doing some text munging for the obituaries to go online at the
newspaper i work for. I have a simple question. Is there a way to
convert some text to plain ascii? i mean the text I am processing has
curly quotes and a few other rich text chars in it. Is there a pure
ruby way of converting these chars into their plain ascii
counterparts short of a regex for each char?

Thanks-
-Ezra



3 Answers

Christian Neukirchen

1/14/2006 4:50:00 PM

0

Ezra Zygmuntowicz <ezmobius@gmail.com> writes:

> Listers-
>
> I'm doing some text munging for the obituaries to go online at
> the newspaper i work for. I have a simple question. Is there
> a way to convert some text to plain ascii? i mean the text I
> am processing has curly quotes and a few other rich text
> chars in it. Is there a pure ruby way of converting these
> chars into their plain ascii counterparts short of a regex
> for each char?

What format is the text in? Could you maybe post a snippet, if possible?

(And, btw, can you tell me the URL of that newspaper. I'm very
interested in putting obituraries online, since that is basically the
only reason our local newspaper gets read at all...)

> Thanks-
> -Ezra
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneuk...


Ezra Zygmuntowicz

1/15/2006 9:32:00 PM

0


On Jan 14, 2006, at 8:50 AM, Christian Neukirchen wrote:

> Ezra Zygmuntowicz <ezmobius@gmail.com> writes:
>
>> Listers-
>>
>> I'm doing some text munging for the obituaries to go online at
>> the newspaper i work for. I have a simple question. Is there
>> a way to convert some text to plain ascii? i mean the text I
>> am processing has curly quotes and a few other rich text
>> chars in it. Is there a pure ruby way of converting these
>> chars into their plain ascii counterparts short of a regex
>> for each char?
>
> What format is the text in? Could you maybe post a snippet, if
> possible?
>
> (And, btw, can you tell me the URL of that newspaper. I'm very
> interested in putting obituraries online, since that is basically the
> only reason our local newspaper gets read at all...)
>
>> Thanks-
>> -Ezra
> --
> Christian Neukirchen <chneukirchen@gmail.com> http://
> chneukirchen.org
>

Christian-

I won't be in the office until tuesday but I will post a sample
then. The url of the newspaper is http://yakima... . The whole
site runs on rails. And I have a ton of ruby code that ties together
the different departments as well. Lots of text processing between
classified/newsroom/web. Also our entire intranet runs on ruby.
Circulation/accounting/prepress/surveys and employee reviews.

Obituaries are a big traffic draw to our web site as well. Thats why
we are working on a better system. Right now the obits don't make it
online until the day after they are in the paper and thats not right.
So instead of letting the obits make their way through the newsroom
database system, I am going to bypass it and send it straight to the
web instead. The format of the text is from an MacOS9 machine so it
has \r for line endings and uses curly quotes and a few other chars
that don't translate well to being displayed on the web. The database
that i pull them out of is an old proprietary BaseView db and the
company is not very forthcoming in helping us use the system in ways
they didn't envision already.

Cheers-
-Ezra



Alex LeDonne

1/18/2006 3:27:00 PM

0

On 1/13/06, Ezra Zygmuntowicz <ezmobius@gmail.com> wrote:
> Listers-
>
> I'm doing some text munging for the obituaries to go online at the
> newspaper i work for. I have a simple question. Is there a way to
> convert some text to plain ascii? i mean the text I am processing has
> curly quotes and a few other rich text chars in it. Is there a pure
> ruby way of converting these chars into their plain ascii
> counterparts short of a regex for each char?
>
> Thanks-
> -Ezra
>

I suspect you'll have to define the mapping yourself... the Unicode
characters U+201C LEFT DOUBLE QUOTATION MARK and U+201D RIGHT DOUBLE
QUOTATION MARK don't appear to have any defined Unicode
composition/decomposition mappings.

http://www.unicode.org/Public/UNIDATA/Unico... for data,
http://www.unicode.org/Public/UNIDATA/UCD.html#Decompositi...
for further detail

-A