[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Document identification

M. Eteum

6/1/2005 3:59:00 PM

Dear Ruby Guru:
Is there a way to identify any documents from its header? I have a
bunch of document collected over the year from multi platform system,
Mac, Windows, and various unix/linux variant where some of the document
does not have file extension. Are there a list that tells us what header
should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
word, excel, visio, etc ...

Thanks
10 Answers

Robin Stocker

6/1/2005 4:39:00 PM

0

M. Eteum wrote:
> Is there a way to identify any documents from its header? I have a
> bunch of document collected over the year from multi platform system,
> Mac, Windows, and various unix/linux variant where some of the document
> does not have file extension. Are there a list that tells us what header
> should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> word, excel, visio, etc ...

Hi,

On a Unix system you could use the "file" command, it is able to detect
file types even when there's no extension.
I don't know if a Ruby module exists for this purpose though.

Regards,
Robin


Austin Ziegler

6/1/2005 5:29:00 PM

0

On 6/1/05, Robin Stocker <robin-lists-ruby-talk@nibor.org> wrote:
> M. Eteum wrote:
> > Is there a way to identify any documents from its header? I have a
> > bunch of document collected over the year from multi platform system,
> > Mac, Windows, and various unix/linux variant where some of the document
> > does not have file extension. Are there a list that tells us what header
> > should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> > word, excel, visio, etc ...
> On a Unix system you could use the "file" command, it is able to detect
> file types even when there's no extension.
> I don't know if a Ruby module exists for this purpose though.

Not yet. ;) I do plan on adding it to MIME::Types in the future.

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca


M. Eteum

6/1/2005 6:01:00 PM

0

Robin Stocker wrote:
> M. Eteum wrote:
>
>> Is there a way to identify any documents from its header? I have
>> a bunch of document collected over the year from multi platform
>> system, Mac, Windows, and various unix/linux variant where some of the
>> document does not have file extension. Are there a list that tells us
>> what header should we expect for certain documents e.g. txt, rtf, pdf,
>> jpg, mpg, word, excel, visio, etc ...
>
>
> Hi,
>
> On a Unix system you could use the "file" command, it is able to detect
> file types even when there's no extension.
> I don't know if a Ruby module exists for this purpose though.
>
> Regards,
> Robin
>
>
Thanks for the reply.

I'm running on Windows as well as MAC. We exchange files between both
OS. Ruby modules that can handle this function would have been nice but
I'll take anything for now.

Thanks again

M. Eteum

6/1/2005 6:04:00 PM

0

Austin Ziegler wrote:
> On 6/1/05, Robin Stocker <robin-lists-ruby-talk@nibor.org> wrote:
>
>>M. Eteum wrote:
>>
>>> Is there a way to identify any documents from its header? I have a
>>>bunch of document collected over the year from multi platform system,
>>>Mac, Windows, and various unix/linux variant where some of the document
>>>does not have file extension. Are there a list that tells us what header
>>>should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
>>>word, excel, visio, etc ...
>>
>>On a Unix system you could use the "file" command, it is able to detect
>>file types even when there's no extension.
>>I don't know if a Ruby module exists for this purpose though.
>
>
> Not yet. ;) I do plan on adding it to MIME::Types in the future.
>
> -austin

Super! Oh by the way, do you know if Perl or Python has it? I'm quite
desperate to find the solution, therefore I'll take any solution while
waiting for the Ruby modules.

Thanks

Ilmari Heikkinen

6/1/2005 6:48:00 PM

0

ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:
> Dear Ruby Guru:
> Is there a way to identify any documents from its header? I have a
> bunch of document collected over the year from multi platform system,
> Mac, Windows, and various unix/linux variant where some of the document
> does not have file extension. Are there a list that tells us what header
> should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> word, excel, visio, etc ...
>
> Thanks

Hello,

If you have shared-mime-info database installed
( http://freedesktop.org/wiki/Software_2fshared_2dm... )
you can use this: http://www.code-monkey.de/projects/mimeI...
Or my extended version: http://dark.fhtr.org/mime_info...

>From the README:

MimeInfo class provides an interface to query freedesktop.org's
shared-mime-info database. It can be used to guess a filename's
Mimetype and to get the description for the Mimetype.

require 'mime_info'

info = MimeInfo.get('foo.xml') #=> Mimetype['text/xml']
info.description
#=> "eXtensible Markup Language document"
info.description("de") #=> "XML-Dokument"

info2 = MimeInfo.get('foo.rb') #=> Mimetype['application/x-ruby']
info2.description #=> "Ruby script"
info2.is_a? Mimetype['text/plain'] #=> true

t = Mimetype['audio/x-mp3'] #=> Mimetype['audio/x-mp3']
t.description #=> "MP3 audio"
t.description('cy') #=> "Sain MP3"
t.descriptions['fr'] #=> "audio MP3"
t == Mimetype['audio']['x-mp3'] #=> true
t.is_a? Mimetype['audio'] #=> true
t.ancestors #=> [Mimetype['audio/x-mp3'], Mimetype['audio'],
# Mimetype['application/octet-stream'], Mimetype,
# Module, Object, Kernel]


HTH,

Ilmari



Austin Ziegler

6/1/2005 8:34:00 PM

0

On 6/1/05, Ilmari Heikkinen <kig@misfiring.net> wrote:
> ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:
> > Dear Ruby Guru:
> > Is there a way to identify any documents from its header? I have a
> > bunch of document collected over the year from multi platform system,
> > Mac, Windows, and various unix/linux variant where some of the document
> > does not have file extension. Are there a list that tells us what header
> > should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> > word, excel, visio, etc ...
> >
> > Thanks

Most of this is covered by MIME::Types on RubyForge. However, the OP
indicated that the problem was related to NOT having proper filename
extensions. The OP wants to look for magic numbers and strings.

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca


Ilmari Heikkinen

6/1/2005 8:56:00 PM

0

ke, 2005-06-01 kello 23:33, Austin Ziegler kirjoitti:
> On 6/1/05, Ilmari Heikkinen <kig@misfiring.net> wrote:
> > ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:
> > > Dear Ruby Guru:
> > > Is there a way to identify any documents from its header? I have a
> > > bunch of document collected over the year from multi platform system,
> > > Mac, Windows, and various unix/linux variant where some of the document
> > > does not have file extension. Are there a list that tells us what header
> > > should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
> > > word, excel, visio, etc ...
> > >
> > > Thanks
>
> Most of this is covered by MIME::Types on RubyForge. However, the OP
> indicated that the problem was related to NOT having proper filename
> extensions. The OP wants to look for magic numbers and strings.
>

Shared-mime-info does this aswell. Though it may fare worse than file in
some cases.

kig@bauhaus:~$ mv fire.avi fire
kig@bauhaus:~$ irb
irb(main):001:0> require 'mime_info'
=> true
irb(main):002:0> MimeInfo.get('fire')
=> Mimetype['video/x-msvideo']




Martin DeMello

6/2/2005 6:51:00 AM

0

M. Eteum <meteum@yahoo.com> wrote:
>
> Super! Oh by the way, do you know if Perl or Python has it? I'm quite
> desperate to find the solution, therefore I'll take any solution while
> waiting for the Ruby modules.

Your best bet would be to find a windows port of unix's 'file' (Mac OSX
is definitely bound to have it). Sadly, it's a very hard thing to google
for :)

martin

Martin DeMello

6/2/2005 7:00:00 AM

0

Martin DeMello <martindemello@yahoo.com> wrote:
> M. Eteum <meteum@yahoo.com> wrote:
> >
> > Super! Oh by the way, do you know if Perl or Python has it? I'm quite
> > desperate to find the solution, therefore I'll take any solution while
> > waiting for the Ruby modules.
>
> Your best bet would be to find a windows port of unix's 'file' (Mac OSX
> is definitely bound to have it). Sadly, it's a very hard thing to google
> for :)

You're in luck - gnuwin32 includes a port of file.

http://gnuwin32.sourceforge.net/su...

All you need to do is a = `file.exe #{filename}`

martin

M. Eteum

6/2/2005 4:35:00 PM

0

Ilmari Heikkinen wrote:
> ke, 2005-06-01 kello 23:33, Austin Ziegler kirjoitti:
>
>>On 6/1/05, Ilmari Heikkinen <kig@misfiring.net> wrote:
>>
>>>ke, 2005-06-01 kello 19:00, M. Eteum kirjoitti:
>>>
>>>>Dear Ruby Guru:
>>>> Is there a way to identify any documents from its header? I have a
>>>>bunch of document collected over the year from multi platform system,
>>>>Mac, Windows, and various unix/linux variant where some of the document
>>>>does not have file extension. Are there a list that tells us what header
>>>>should we expect for certain documents e.g. txt, rtf, pdf, jpg, mpg,
>>>>word, excel, visio, etc ...
>>>>
>>>>Thanks
>>
>>Most of this is covered by MIME::Types on RubyForge. However, the OP
>>indicated that the problem was related to NOT having proper filename
>>extensions. The OP wants to look for magic numbers and strings.
>>
>
>
> Shared-mime-info does this aswell. Though it may fare worse than file in
> some cases.
>
> kig@bauhaus:~$ mv fire.avi fire
> kig@bauhaus:~$ irb
> irb(main):001:0> require 'mime_info'
> => true
> irb(main):002:0> MimeInfo.get('fire')
> => Mimetype['video/x-msvideo']
>
>
>
>
Thanks, but where do you get the mime_info.rb? I'm running "ruby 1.8.2
(2004-12-25) [i386-mswin32]" and it seems it does not have the necessary
files.

Thanks