Robert Klemme
6/19/2007 7:54:00 AM
On 19.06.2007 09:33, Alin Popa wrote:
> Alin Popa wrote:
>> Alex Young wrote:
>>> Alin Popa wrote:
>>>> Hi guys,
>>>>
>>>> After some research I still cannot find a way how to see if a file is
>>>> plain text or binary. In fact I want to check if a file is plain text no
>>>> matter what characters are in it.
>>>> This thing may be possible by using ruby ?
>>> I think so, but it's a little unclear exactly what you're trying to
>>> achieve. Do you have an example?
>> I'm trying to do a replace in file for some text but I don't want to
>> consider files like archives or other binary files.
>
> Of course, when I'm on windows I can go after the file extension and try
> to ignore some specific (eg. .exe, .zip, .jar, .rar, .anything_i_want)
> but I don't know how to do it on Linux/Unix OS where file extension is
> not mandatory.
You could read the file (or portion of the file), create a histogram of
byte (or groups of bytes) occurrences and compare that to what you
expect for text files (e.g. most chars are "0-9a-zA-Z" and punctuation).
You could as well use command "file" and parse its output.
Kind regards
robert