Asp Forum - Library Metadata Storage Format

Trans

7/24/2008 7:25:00 PM

Hi--

I've been trying to decide the best way how to store library/project
metadata (name, version, etc.). I use this data for a few different
tools, one of those being Rolls, which is an alternate require system
for Ruby. I've considered using straight Ruby, INI, YAML and multiple
files (one per property).

Ruby scripts are too limited for my purposes b/c they are impossible to
selectively edit with automated tools.

INI files would work well, and they are pretty easy to parse, but it
doesn't seem to be the Ruby way --where YAML tends to rule the day.

YAML would seem to be the obvious choice. But I hesitate because Syck is
a fairly heavy dependency for something like Rolls where light weight is
a big advantage. Also automated manipulation of YAML files isn't all
that optimal --round trip a YAML file and formatting can become fairly
distorted.

My last option, of per-property files, is appealing b/c it requires no
special parser library, and such files are very easy to manipulate. The
downside of course is that a dozen of so little files can seem a bit
unwieldy and can waste file system space (depending on block size).

So, as you can see I'm torn. Should I take the high road, and not worry
about YAML's heft, or the low road and not worry about the
unconventional use of many small files, or is there a better road
altogether?

--7rans
--
Posted via http://www.ruby-....

2 Answers

David Masover

7/26/2008 4:50:00 AM

On Thursday 24 July 2008 14:25:09 Thomas Sawyer wrote:

> YAML would seem to be the obvious choice. But I hesitate because Syck is
> a fairly heavy dependency for something like Rolls where light weight is
> a big advantage.

YAML is also in the standard library, so if you're targeting anything
resembling a standard distribution of Ruby, it'll be there.

> Also automated manipulation of YAML files isn't all
> that optimal --round trip a YAML file and formatting can become fairly
> distorted.

Because YAML is a serialization format. They make good config files, but do
you really need to support comments in the file? Your other proposal doesn't
seem to allow for that, anyway...

> The
> downside of course is that a dozen of so little files can seem a bit
> unwieldy and can waste file system space (depending on block size).

My attitude is, do what's convenient, and let the filesystem worry about disk
space. Some filesystems support concepts like "sub blocks" and "tail packing"
which can lead to quite efficient storage of small files.

Worry more about the usability of it. If you litter the project with small
files, is that going to be annoying for users? I know one of the selling
points of git over SVN is that git stores one .git folder at the top of the
checkout, whereas SVN stores a .svn folder in every directory of the
checkout.

Trans

7/27/2008 4:36:00 PM

Thanks for the response David. Thinking through thes issues all by
myself and get a little stir crazy, so getting some feedback like this
really helps.

David Masover wrote:
> YAML is also in the standard library, so if you're targeting anything
> resembling a standard distribution of Ruby, it'll be there.
>
>> Also automated manipulation of YAML files isn't all
>> that optimal --round trip a YAML file and formatting can become fairly
>> distorted.
>
> Because YAML is a serialization format. They make good config files, but
> do
> you really need to support comments in the file? Your other proposal
> doesn't
> seem to allow for that, anyway...

Hmm.. that true. That's part of the issue really. To be more specific, I
want to automate version bumping. If I rewrite the whole metadata.yaml
file to update the version entry, you are right, bye bye comments.
Another option is to hack a regexp solution. It would probably work ok
most of the time. But that's a hacky band-aid kind of fix.

>> The
>> downside of course is that a dozen of so little files can seem a bit
>> unwieldy and can waste file system space (depending on block size).
>
> My attitude is, do what's convenient, and let the filesystem worry about
> disk
> space. Some filesystems support concepts like "sub blocks" and "tail
> packing"
> which can lead to quite efficient storage of small files.

Good point. Leave storage to the storage guys. It would only be a dozen
files or so, so we're not talking a whole lot of space anyway.

> Worry more about the usability of it. If you litter the project with
> small
> files, is that going to be annoying for users? I know one of the selling
> points of git over SVN is that git stores one .git folder at the top of
> the
> checkout, whereas SVN stores a .svn folder in every directory of the
> checkout.

Yea, I hate that about svn. This won't be a problem here; the files
would be in one special directory. It can be annoying to edit them all,
at least for the first go round, after that they rarely change. The
other thing is for tools that might want to scrape project info. (a la
CSPAN's META.yml) I wonder if it would be too much trouble for this
usecase to have to fetch multiple files (of course, I could always
generate an index file based on the separate files).

I came across another good reason to use separate files -- say I use a
generator (eg. rubigen) to scaffold out a license. It would add the
LICENSE (or COPYING) file to my project, but it would also want to
update the metadata entry. That's easy if there's a separate
meta/license file. If there were a reliable way to update a YAML file in
a piecemeal fashion, then this wouldn't be much of an issue; but without
that... well I guess I'm just not sure how comfortable I feel with a
"usually works" regexp hack.

T.
--
Posted via http://www.ruby-....

comp.lang.ruby

Library Metadata Storage Format

Trans

David Masover

Trans

x Login to ForumsZone