[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[ANN] Metadata 0.3

Ilmari Heikkinen

9/15/2007 7:09:00 AM

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:
> Quoth Ilmari Heikkinen:
> > On 9/14/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > > Hmm, am I not seeing it (just using 'mdh -p') or can metadata.rb extract
> > > stuff like artist, title, album, track, and whatnot from ogg/flac?
> >
> > It should at least. If you're having trouble, lemme know
> >
> Yeah, I'm having some trouble. I have latest metadata (0.2).
>
> [snip]
>
> Any ideas?

Yeah, I failed at using git. Jeez. Sorry about that.
Here's 0.3, it oughta work:

tarball: http://dark.fhtr.org/repo.../metadata-...
git: http://dark.fhtr.org/repo...


On 9/15/07, darren kirby <bulliver@badcomputer.org> wrote:
> Hi Ilmari!
>
> Just wanted to mention that despite the name, wmainfo will parse anything
> wrapped in an ASF audio/video container format[0], so, you could use it to
> parse wmv movies as well if your user didn't have mplayer installed.
>
> [0] http://en.wikipedia.org/wiki/Advanced_Syst...
>

Thanks for the pointer!
I made it merge the wmainfo output to the mplayer output for wmv and asf.


Description
-----------

This package `Metadata' comes with a library called `metadata' and
a small program called `mdh'.

The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.

Mdh can print out file metadata as YAML and package the metadata
with the file.

This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks, filename
extensions, bags of bytes and mimetypes.


Usage
-----

# print out metadata header
mdh -p myfile.jpg

# create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
mdh myfile.jpg

# print out metadata header from mdh file
mdh -e -p myfile.jpg.mdh

# strip out metadata header from mdh file and save it to myfile.jpg
mdh -e myfile.jpg.mdh

irb> Metadata.extract('myfile.jpg')
irb> Metadata.extract_text('myfile.jpg')
irb> Pathname.new("myfile.jpg").metadata


List of supported formats
-------------------------

Audio:
Successfully tested with:
mp3, flac, ogg, wav
Should also work:
wma, m4a

Video:
What you manage to make mplayer play, which can be just about anything.
Then again, missing title and author data, etc. (do videos even have those?)
Successfully tested with:
wmv, mov, divx, xvid, flv, ogm, mpg

Images:
Should handle pretty much anything (apart from XCF and ORF.)
Successfully tested with:
jpeg, png, gif, nef, dng, crw, pef, psd

Documents:
Successfully tested with:
pdf, ppt, odp, sxi, ps, ps.gz, html, txt
Should work:
- OpenOffice docs work to some degree (personally, I'm using unoconv to
convert OO docs to temp PDFs for the text & dimensions extraction, so
those bits of data are missing.)
- MS Office docs to some degree (ppt at least, doc and xls should work too,
dimensions missing due to the above temp PDF -thing.)

Others:
Whatever extract spits out on the five or six bits of metadata I'm using
from it. Archive contents at least.

Requirements
------------

* Ruby 1.8

* Tons of metadata extraction programs and libs,
list of gems:
flacinfo-rb
wmainfo-rb
MP4info
list of debian packages:
dcraw
libimlib2-ruby
extract
libimage-exiftool-perl
poppler-utils
mplayer
html2text
imagemagick
unhtml
pstotext
antiword
catdoc
shared-mime-info
vorbis-tools

* You do want to install the latest versions of dcraw and
shared-mime-info to be able to handle camera raw images.
http://cybercom.net/~dcof...
http://freedesktop.org/wiki/Software/shared...

* Python + chardet library
http://chardet.feedp...

Install
-------

De-compress archive and enter its top directory.
Then type:

($ su)
# ruby setup.rb

These simple step installs this program under the default
location of Ruby libraries. You can also install files into
your favorite directory by supplying setup.rb some options.
Try "ruby setup.rb --help".


License
-------

Ruby's


--
Ilmari Heikkinen <ilmari.heikkinen gmail com>
http://fhtr.bl...

6 Answers

Konrad Meyer

9/15/2007 7:33:00 AM

0

Quoth Ilmari Heikkinen:
> On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > Quoth Ilmari Heikkinen:
> > > On 9/14/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > > > Hmm, am I not seeing it (just using 'mdh -p') or can metadata.rb
extract
> > > > stuff like artist, title, album, track, and whatnot from ogg/flac?
> > >
> > > It should at least. If you're having trouble, lemme know
> > >
> > Yeah, I'm having some trouble. I have latest metadata (0.2).
> >
> > [snip]
> >
> > Any ideas?
>
> Yeah, I failed at using git. Jeez. Sorry about that.
> Here's 0.3, it oughta work:
>
> tarball: http://dark.fhtr.org/repo.../metadata-...
> git: http://dark.fhtr.org/repo...
>
>
> On 9/15/07, darren kirby <bulliver@badcomputer.org> wrote:
> > Hi Ilmari!
> >
> > Just wanted to mention that despite the name, wmainfo will parse anything
> > wrapped in an ASF audio/video container format[0], so, you could use it to
> > parse wmv movies as well if your user didn't have mplayer installed.
> >
> > [0] http://en.wikipedia.org/wiki/Advanced_Syst...
> >
>
> Thanks for the pointer!
> I made it merge the wmainfo output to the mplayer output for wmv and asf.
>
>
> Description
> -----------
>
> This package `Metadata' comes with a library called `metadata' and
> a small program called `mdh'.
>
> The library probes files for their metadata (e.g. jpeg dimensions
> and camera make, mp3 artist, pdf word count) and returns the metadata
> as a Hash.
>
> Mdh can print out file metadata as YAML and package the metadata
> with the file.
>
> This package has many dependencies since there is no single universal
> metadata header format that all files use. Blame resource forks, filename
> extensions, bags of bytes and mimetypes.
>
>
> Usage
> -----
>
> # print out metadata header
> mdh -p myfile.jpg
>
> # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
> mdh myfile.jpg
>
> # print out metadata header from mdh file
> mdh -e -p myfile.jpg.mdh
>
> # strip out metadata header from mdh file and save it to myfile.jpg
> mdh -e myfile.jpg.mdh
>
> irb> Metadata.extract('myfile.jpg')
> irb> Metadata.extract_text('myfile.jpg')
> irb> Pathname.new("myfile.jpg").metadata
>
>
> List of supported formats
> -------------------------
>
> Audio:
> Successfully tested with:
> mp3, flac, ogg, wav
> Should also work:
> wma, m4a
>
> Video:
> What you manage to make mplayer play, which can be just about anything.
> Then again, missing title and author data, etc. (do videos even have
those?)
> Successfully tested with:
> wmv, mov, divx, xvid, flv, ogm, mpg
>
> Images:
> Should handle pretty much anything (apart from XCF and ORF.)
> Successfully tested with:
> jpeg, png, gif, nef, dng, crw, pef, psd
>
> Documents:
> Successfully tested with:
> pdf, ppt, odp, sxi, ps, ps.gz, html, txt
> Should work:
> - OpenOffice docs work to some degree (personally, I'm using unoconv to
> convert OO docs to temp PDFs for the text & dimensions extraction, so
> those bits of data are missing.)
> - MS Office docs to some degree (ppt at least, doc and xls should work
too,
> dimensions missing due to the above temp PDF -thing.)
>
> Others:
> Whatever extract spits out on the five or six bits of metadata I'm using
> from it. Archive contents at least.
>
> Requirements
> ------------
>
> * Ruby 1.8
>
> * Tons of metadata extraction programs and libs,
> list of gems:
> flacinfo-rb
> wmainfo-rb
> MP4info
> list of debian packages:
> dcraw
> libimlib2-ruby
> extract
> libimage-exiftool-perl
> poppler-utils
> mplayer
> html2text
> imagemagick
> unhtml
> pstotext
> antiword
> catdoc
> shared-mime-info
> vorbis-tools
>
> * You do want to install the latest versions of dcraw and
> shared-mime-info to be able to handle camera raw images.
> http://cybercom.net/~dcof...
> http://freedesktop.org/wiki/Software/shared...
>
> * Python + chardet library
> http://chardet.feedp...
>
> Install
> -------
>
> De-compress archive and enter its top directory.
> Then type:
>
> ($ su)
> # ruby setup.rb
>
> These simple step installs this program under the default
> location of Ruby libraries. You can also install files into
> your favorite directory by supplying setup.rb some options.
> Try "ruby setup.rb --help".
>
>
> License
> -------
>
> Ruby's
>
>
> --
> Ilmari Heikkinen <ilmari.heikkinen gmail com>
> http://fhtr.bl...

Any chance you could wrap this up as a gem? It's not something I care
strongly about, and I don't know how complicated the process is, but I think
it would help ease installation for some users.

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertil...

Konrad Meyer

9/15/2007 8:37:00 AM

0

Quoth Ilmari Heikkinen:
> On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > Quoth Ilmari Heikkinen:
> > > On 9/14/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > > > Hmm, am I not seeing it (just using 'mdh -p') or can metadata.rb
extract
> > > > stuff like artist, title, album, track, and whatnot from ogg/flac?
> > >
> > > It should at least. If you're having trouble, lemme know
> > >
> > Yeah, I'm having some trouble. I have latest metadata (0.2).
> >
> > [snip]
> >
> > Any ideas?
>
> Yeah, I failed at using git. Jeez. Sorry about that.
> Here's 0.3, it oughta work:
>
> tarball: http://dark.fhtr.org/repo.../metadata-...
> git: http://dark.fhtr.org/repo...
>
>
> On 9/15/07, darren kirby <bulliver@badcomputer.org> wrote:
> > Hi Ilmari!
> >
> > Just wanted to mention that despite the name, wmainfo will parse anything
> > wrapped in an ASF audio/video container format[0], so, you could use it to
> > parse wmv movies as well if your user didn't have mplayer installed.
> >
> > [0] http://en.wikipedia.org/wiki/Advanced_Syst...
> >
>
> Thanks for the pointer!
> I made it merge the wmainfo output to the mplayer output for wmv and asf.
>
>
> Description
> -----------
>
> This package `Metadata' comes with a library called `metadata' and
> a small program called `mdh'.
>
> The library probes files for their metadata (e.g. jpeg dimensions
> and camera make, mp3 artist, pdf word count) and returns the metadata
> as a Hash.
>
> Mdh can print out file metadata as YAML and package the metadata
> with the file.
>
> This package has many dependencies since there is no single universal
> metadata header format that all files use. Blame resource forks, filename
> extensions, bags of bytes and mimetypes.
>
>
> Usage
> -----
>
> # print out metadata header
> mdh -p myfile.jpg
>
> # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
> mdh myfile.jpg
>
> # print out metadata header from mdh file
> mdh -e -p myfile.jpg.mdh
>
> # strip out metadata header from mdh file and save it to myfile.jpg
> mdh -e myfile.jpg.mdh
>
> irb> Metadata.extract('myfile.jpg')
> irb> Metadata.extract_text('myfile.jpg')
> irb> Pathname.new("myfile.jpg").metadata
>
>
> List of supported formats
> -------------------------
>
> Audio:
> Successfully tested with:
> mp3, flac, ogg, wav
> Should also work:
> wma, m4a
>
> Video:
> What you manage to make mplayer play, which can be just about anything.
> Then again, missing title and author data, etc. (do videos even have
those?)
> Successfully tested with:
> wmv, mov, divx, xvid, flv, ogm, mpg
>
> Images:
> Should handle pretty much anything (apart from XCF and ORF.)
> Successfully tested with:
> jpeg, png, gif, nef, dng, crw, pef, psd
>
> Documents:
> Successfully tested with:
> pdf, ppt, odp, sxi, ps, ps.gz, html, txt
> Should work:
> - OpenOffice docs work to some degree (personally, I'm using unoconv to
> convert OO docs to temp PDFs for the text & dimensions extraction, so
> those bits of data are missing.)
> - MS Office docs to some degree (ppt at least, doc and xls should work
too,
> dimensions missing due to the above temp PDF -thing.)
>
> Others:
> Whatever extract spits out on the five or six bits of metadata I'm using
> from it. Archive contents at least.
>
> Requirements
> ------------
>
> * Ruby 1.8
>
> * Tons of metadata extraction programs and libs,
> list of gems:
> flacinfo-rb
> wmainfo-rb
> MP4info
> list of debian packages:
> dcraw
> libimlib2-ruby
> extract
> libimage-exiftool-perl
> poppler-utils
> mplayer
> html2text
> imagemagick
> unhtml
> pstotext
> antiword
> catdoc
> shared-mime-info
> vorbis-tools
>
> * You do want to install the latest versions of dcraw and
> shared-mime-info to be able to handle camera raw images.
> http://cybercom.net/~dcof...
> http://freedesktop.org/wiki/Software/shared...
>
> * Python + chardet library
> http://chardet.feedp...
>
> Install
> -------
>
> De-compress archive and enter its top directory.
> Then type:
>
> ($ su)
> # ruby setup.rb
>
> These simple step installs this program under the default
> location of Ruby libraries. You can also install files into
> your favorite directory by supplying setup.rb some options.
> Try "ruby setup.rb --help".
>
>
> License
> -------
>
> Ruby's
>
>
> --
> Ilmari Heikkinen <ilmari.heikkinen gmail com>
> http://fhtr.bl...

Er, I'm still not getting information out of ogg files:

$ mdh -p ~/music/bowling_for_soup_-_1985.ogg
---
Video.Duration: 192.78
Audio.Samplerate: 44100
Audio.Bitrate: 192.0
Image.DimensionUnit: px
Video.Codec: ""
File.Size: 4618665
Audio.Codec: vrbs
File.Modified: 2007-01-03T22:10:11-08:00
File.Format: video/x-theora+ogg

$ mplayer ~/music/bowling_for_soup_-_1985.ogg
...
Clip info:
Genre: Pop
Name: 1985
Artist: Bowling for Soup
Creation Date: 2004
Album: A Hangover You Don't Deserve
Track: 03

Thanks for your quick responses!

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertil...

Ilmari Heikkinen

9/15/2007 11:39:00 AM

0

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:
> Er, I'm still not getting information out of ogg files:
>
> $ mdh -p ~/music/bowling_for_soup_-_1985.ogg
> ---
> Video.Duration: 192.78
> Audio.Samplerate: 44100
> Audio.Bitrate: 192.0
> Image.DimensionUnit: px
> Video.Codec: ""
> File.Size: 4618665
> Audio.Codec: vrbs
> File.Modified: 2007-01-03T22:10:11-08:00


> File.Format: video/x-theora+ogg

^- That's the problem there. It thinks it's a video file.

<technical blather>
Why? Probably because I hacked the mimetype guesser to _not_ assume
things based on the filename extension, and the shared-mime-info db
assumes that the guesser _is_ assuming things based on the filename
extension.

Which is something I'd rather not do with downloaded files (which, by
their very nature, have wild disparities between the extension and the
real mimetype.) And the header content-type is often totally wrong or
doesn't match shared-mime-info's naming (e.g.
application/octet-stream vs. image/gif, audio/x-mp3 vs. audio/mpeg,
video/divx vs. video/x-msvideo, video/x-ms-asf vs. video/vnd.ms-asf...)

And this magic-over-extension sometimes leads to me getting generic
lesser-magic guesses instead of more specific filename extension
guesses (e.g. zip instead of OO document.) So, I have a list of
generic formats that defer to the extension rather than rely on
the lesser-magic.

Anyhow, it's ugly, hacky magic.
Just like the rest of mimetype guessing.
</technical blather>

But! Fixing this instance of the problem in the next thirty seconds.
... There!

And now, adding ogginfo metadata to video/x-theora+ogg.

Ok, try this:

http://dark.fhtr.org/repos/metadata/metadata-...


> Thanks for your quick responses!

Thanks for the bug reports! They really help in making this thing
more robust.

> Konrad Meyer <konrad@tylerc.org> http://konrad.sobertil...

--
Ilmari Heikkinen
http://fhtr.bl...

Ilmari Heikkinen

9/15/2007 11:42:00 AM

0

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:

> $ mplayer ~/music/bowling_for_soup_-_1985.ogg
> ...
> Clip info:
> Genre: Pop
> Name: 1985
> Artist: Bowling for Soup
> Creation Date: 2004
> Album: A Hangover You Don't Deserve
> Track: 03

Oh, nice, mplayer does give out metadata fields. I better augment
the mplayer info parser to grab those :)

0.5 here we come!

Konrad Meyer

9/15/2007 7:51:00 PM

0

Quoth Ilmari Heikkinen:
> On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:
>
> > $ mplayer ~/music/bowling_for_soup_-_1985.ogg
> > ...
> > Clip info:
> > Genre: Pop
> > Name: 1985
> > Artist: Bowling for Soup
> > Creation Date: 2004
> > Album: A Hangover You Don't Deserve
> > Track: 03
>
> Oh, nice, mplayer does give out metadata fields. I better augment
> the mplayer info parser to grab those :)
>
> 0.5 here we come!

Another bug (Sorry :D):
$ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ \(edited\).ogg
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
(edited).ogg''

(Last line was broken up to email length.) You're already escaping single
quotes for the shell, need to escape start-parens and end-parens as well.

Thanks,

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertil...

Konrad Meyer

9/15/2007 7:58:00 PM

0

Quoth Ilmari Heikkinen:
> On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:
>
> > $ mplayer ~/music/bowling_for_soup_-_1985.ogg
> > ...
> > Clip info:
> > Genre: Pop
> > Name: 1985
> > Artist: Bowling for Soup
> > Creation Date: 2004
> > Album: A Hangover You Don't Deserve
> > Track: 03
>
> Oh, nice, mplayer does give out metadata fields. I better augment
> the mplayer info parser to grab those :)
>
> 0.5 here we come!

Also:
For mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inserted
at the front of all the string fields.

$ mdh -p ~/music/Snoop\ Dogg\ -\ Gin\ \&\ Juice.mp3
---
Audio.Album: "\xCB\x99\xC5\xA3Death Row's Snoop Doggy Dogg Greatest Hits
(2001)"
...
Audio.Genre: "\xCB\x99\xC5\xA3Hip-Hop"
Audio.Title: "\xCB\x99\xC5\xA3Gin & Juice"
...
Audio.Artist: "\xCB\x99\xC5\xA3Snoop Dogg"

I *think* this is an id3v2 thing. Also, it happens in more than one file and
amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
end. Thanks!
--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertil...