Christian Heimes
2/12/2010 2:14:00 PM
Lloyd Zusman wrote:
> .... The -T and -B switches work as follows. The first block or so
> .... of the file is examined for odd characters such as strange control
> .... codes or characters with the high bit set. If too many strange
> .... characters (>30%) are found, it's a -B file; otherwise it's a -T
> .... file. Also, any file containing null in the first block is
> .... considered a binary file. [ ... ]
That's a butt ugly heuristic that will lead to lots of false positives
if your text happens to be UTF-16 encoded or non-english text UTF-8 encoded.
Christian