[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Test if file is binary ?

Rebhan, Gilbert

8/21/2007 6:04:00 AM


Hi ,

how to test if a file is binary or not ?

There ain't something like File.binary =
NoMethodError: undefined method `binary?' for File:Class

Any ideas or libraries available ?

Regards, Gilbert

25 Answers

Dejan Dimic

8/21/2007 6:46:00 AM

0

On Aug 21, 8:04 am, "Rebhan, Gilbert" <Gilbert.Reb...@huk-coburg.de>
wrote:
> Hi ,
>
> how to test if a file is binary or not ?
>
> There ain't something like File.binary =
> NoMethodError: undefined method `binary?' for File:Class
>
> Any ideas or libraries available ?
>
> Regards, Gilbert

What to you need to achieve with this is_binary? method?
All files are just collection of bytes, so in a perspective they all
are binary. We interpret them as suites our needs.

Rebhan, Gilbert

8/21/2007 6:57:00 AM

0


Hi,

-----Original Message-----
From: dima [mailto:dejan.dimic@gmail.com]
Sent: Tuesday, August 21, 2007 8:50 AM
To: ruby-talk ML
Subject: Re: Test if file is binary ?

On Aug 21, 8:04 am, "Rebhan, Gilbert" <Gilbert.Reb...@huk-coburg.de>
wrote:
> Hi ,
>>
>> how to test if a file is binary or not ?
>>
>> There ain't something like File.binary =
>> NoMethodError: undefined method `binary?' for File:Class
>>
>> Any ideas or libraries available ?

>What to you need to achieve with this is_binary? method?
>All files are just collection of bytes, so in a perspective they all
>are binary. We interpret them as suites our needs.

For example this information is needed to decide whether
cvs should handle that file / that fileextension as binary or ascii

Regards, Gilbert


Robert Klemme

8/21/2007 7:05:00 AM

0

2007/8/21, Rebhan, Gilbert <Gilbert.Rebhan@huk-coburg.de>:
>
> Hi ,
>
> how to test if a file is binary or not ?
>
> There ain't something like File.binary =
> NoMethodError: undefined method `binary?' for File:Class
>
> Any ideas or libraries available ?

If I'd really need it I'd probably do a heuristic based on
distribution of byte values across an initial portion of the file.
Something like this:

class File
def self.binary?(name)
ascii = control = binary = 0

File.open(name, "rb") {|io| io.read(1024)}.each_byte do |bt|
case bt
when 0...32
control += 1
when 32...128
ascii += 1
else
binary += 1
end
end

control.to_f / ascii > 0.1 || binary.to_f / ascii > 0.05
end
end

Kind regards

robert

Rebhan, Gilbert

8/21/2007 7:13:00 AM

0


Hi,

-----Original Message-----
From: Robert Klemme [mailto:shortcutter@googlemail.com]
Sent: Tuesday, August 21, 2007 9:05 AM
To: ruby-talk ML
Subject: Re: Test if file is binary ?

2007/8/21, Rebhan, Gilbert <Gilbert.Rebhan@huk-coburg.de>:
>
> Hi ,
>
> how to test if a file is binary or not ?
>
> There ain't something like File.binary =
> NoMethodError: undefined method `binary?' for File:Class
>
> Any ideas or libraries available ?

/*

If I'd really need it I'd probably do a heuristic based on
distribution of byte values across an initial portion of the file.
Something like this:

class File
def self.binary?(name)
ascii = control = binary = 0

File.open(name, "rb") {|io| io.read(1024)}.each_byte do |bt|
case bt
when 0...32
control += 1
when 32...128
ascii += 1
else
binary += 1
end
end

control.to_f / ascii > 0.1 || binary.to_f / ascii > 0.05
end
end

*/


Nice :-) Thanks !!

Regards, Gilbert

Alex Gutteridge

8/21/2007 7:22:00 AM

0

On 21 Aug 2007, at 15:57, Rebhan, Gilbert wrote:

>
> Hi,
>
> -----Original Message-----
> From: dima [mailto:dejan.dimic@gmail.com]
> Sent: Tuesday, August 21, 2007 8:50 AM
> To: ruby-talk ML
> Subject: Re: Test if file is binary ?
>
> On Aug 21, 8:04 am, "Rebhan, Gilbert" <Gilbert.Reb...@huk-coburg.de>
> wrote:
>> Hi ,
>>>
>>> how to test if a file is binary or not ?
>>>
>>> There ain't something like File.binary =
>>> NoMethodError: undefined method `binary?' for File:Class
>>>
>>> Any ideas or libraries available ?
>
>> What to you need to achieve with this is_binary? method?
>> All files are just collection of bytes, so in a perspective they all
>> are binary. We interpret them as suites our needs.
>
> For example this information is needed to decide whether
> cvs should handle that file / that fileextension as binary or ascii
>
> Regards, Gilbert

One simple approach is this:

class File
def is_binary?
ascii = 0
total = 0
self.read(1024).each_byte{|c| total += 1; ascii +=1 if c >= 128
or c == 0}
ascii.to_f / total.to_f > 0.33 ? true : false
end
end

You can tweak the 0.33 value if you like. Probably better (i.e. more
robust) ways out there though.

Alex Gutteridge

Bioinformatics Center
Kyoto University



Alex Gutteridge

8/21/2007 7:24:00 AM

0

Sorry for the duplicate! Robert is too fast for me.

Alex Gutteridge

Bioinformatics Center
Kyoto University



Robert Klemme

8/21/2007 7:41:00 AM

0

2007/8/21, Alex Gutteridge <alexg@kuicr.kyoto-u.ac.jp>:
> Sorry for the duplicate! Robert is too fast for me.

It's always good to see more solutions. I like the conciseness of
your solution. But I think this should rather be a class method
because you would not do the test on an open stream. Dunno which of
the solutions is more realistic. Might be fun to let both approaches
test a large number of files and compare their results (probably also
with output from "file"). :-)

Btw, you should get rid of the ternary operator - it's totally
superfluous because there is no point in converting a boolean value
into a boolean value. :-)

Kind regards

robert

Rebhan, Gilbert

8/21/2007 8:21:00 AM

0



-----Original Message-----
From: Robert Klemme [mailto:shortcutter@googlemail.com]
Sent: Tuesday, August 21, 2007 9:41 AM
To: ruby-talk ML
Subject: Re: Test if file is binary ?

2007/8/21, Alex Gutteridge <alexg@kuicr.kyoto-u.ac.jp>:
> Sorry for the duplicate! Robert is too fast for me.

/*
It's always good to see more solutions. I like the conciseness of
your solution. But I think this should rather be a class method
because you would not do the test on an open stream. Dunno which of
the solutions is more realistic.
*/

you mean it should be something like ? =

class File
def self.is_binary?(name)
ascii = total = 0
File.open(name, "rb") { |io| io.read(1024) }.each_byte do |c|
total += 1;
ascii +=1 if c >= 128 or c == 0
end
ascii.to_f / total.to_f > 0.33
end
end


/*
Might be fun to let both approaches
test a large number of files and compare their results (probably also
with output from "file"). :-)
*/

Is there an exisiting standard what is considered as a binary file,
means a
rule like check the first block from a file and =

- if control characters (ASCII 0-32) and "high ASCII" (> 128) are found
>30 %
it's considered as binary file otherwise textfile

- if control characters (ASCII 0-32 and > 128) are found == 0 it's
always
considered as textfile

??


Regards, Gilbert




Xavier Noria

8/21/2007 8:25:00 AM

0

On Aug 21, 2007, at 10:21 AM, Rebhan, Gilbert wrote:

> Is there an exisiting standard what is considered as a binary file,
> means a
> rule like check the first block from a file and =
>
> - if control characters (ASCII 0-32) and "high ASCII" (> 128) are
> found
>> 30 %
> it's considered as binary file otherwise textfile
>
> - if control characters (ASCII 0-32 and > 128) are found == 0 it's
> always
> considered as textfile
>
> ??

What's the heuristic in Subversion?

-- fxn


Rebhan, Gilbert

8/21/2007 8:34:00 AM

0



-----Original Message-----
From: Xavier Noria [mailto:fxn@hashref.com]
Sent: Tuesday, August 21, 2007 10:25 AM
To: ruby-talk ML
Subject: Re: Test if file is binary ?

On Aug 21, 2007, at 10:21 AM, Rebhan, Gilbert wrote:

> Is there an exisiting standard what is considered as a binary file,
> means a
> rule like check the first block from a file and =
>
> - if control characters (ASCII 0-32) and "high ASCII" (> 128) are
> found
>> 30 %
> it's considered as binary file otherwise textfile
>
> - if control characters (ASCII 0-32 and > 128) are found == 0 it's
> always
> considered as textfile
>
> ??

/*
What's the heuristic in Subversion?
*/

the subversion FAQ
http://subversion.tigris.org/faq.html#bi... has =
" ...
if any of the bytes are zero, or if more than 15% are not ASCII printing
characters,
then Subversion calls the file binary. This heuristic might be improved
in the future, however."

Regards, Gilbert