[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Text encodings

xTRiM

7/10/2006 10:26:00 AM

Hello,

is there any way, to detect text encoding?
For example, is it in utf8, or in win1251, or something else.

Thank you.

4 Answers

Paul Battley

7/10/2006 11:09:00 AM

0

On 10/07/06, xTRiM <rtokarev@gmail.com> wrote:
> is there any way, to detect text encoding?
> For example, is it in utf8, or in win1251, or something else.

You can't detect one-byte-per-character encodings easily (i.e. without
statistical analysis) but you can easily tell if something's UTF-8 or
not:

class String
def is_utf8?
unpack('U*')
return true
rescue
return false
end
end

"foo".is_utf8? #=> true
"foo\303".is_utf8? #=> false

Not the most efficient way, necessarily, but probably the easiest.

Paul.

Takashi Sano

7/10/2006 11:47:00 AM

0

Hi,

2006/7/10, xTRiM <rtokarev@gmail.com>:
> Hello,
>
> is there any way, to detect text encoding?
> For example, is it in utf8, or in win1251, or something else.
>

You can use the standard lib NKF's guess or guess2 (ruby 1.8.2 or
later) method for that. Look up the NKF section in
http://www.ruby-doc.o....

Takashi Sano

Tim Bray

7/10/2006 5:48:00 PM

0

On Jul 10, 2006, at 4:47 AM, Takashi Sano wrote:

>> is there any way, to detect text encoding?
>> For example, is it in utf8, or in win1251, or something else.
>
> You can use the standard lib NKF's guess or guess2 (ruby 1.8.2 or
> later) method for that. Look up the NKF section in
> http://www.ruby-doc.o....

In the general case, there's *no safe way* to do this, unless the
data is XML or comes with an HTTP header from a reliable server (ha
ha ha, I'm sure there must be one somewhere). Probably the best auto-
detecter is Mark Pilgrim's, but it's in Python: http://
chardet.feedparser.org/

-Tim


Jacob Harris

7/10/2006 6:19:00 PM

0

> On Jul 10, 2006, at 4:47 AM, Takashi Sano wrote:
>
>>> is there any way, to detect text encoding?
>>> For example, is it in utf8, or in win1251, or something else.
>>
>> You can use the standard lib NKF's guess or guess2 (ruby 1.8.2 or
>> later) method for that. Look up the NKF section in
>> http://www.ruby-doc.o....
>
> In the general case, there's *no safe way* to do this, unless the
> data is XML or comes with an HTTP header from a reliable server (ha
> ha ha, I'm sure there must be one somewhere). Probably the best auto-
> detecter is Mark Pilgrim's, but it's in Python: http://
> chardet.feedparser.org/
>
> -Tim

Nice pointer, Tim. I'll have to check that out. I did a quick web search
and found a Ruby port incidentally (I have not evaluated it in any way
though):
http://rubyforge.org/project... by Hui Zheng
gem name is "chardet"

Jake