Austin Ziegler
11/24/2004 3:46:00 AM
On Wed, 24 Nov 2004 11:29:51 +0900, Francis Hwang <sera@fhwang.net>
wrote:
> More generally, let me ask: What formats are people using to
> persist Ruby objects to disk? What are pluses and minuses? I can't
> figure out when I should use something like YAML and when I should
> use the Marshal module and if there's anything else out there that
> people are using, I'm probably going to get sort of confused but
> should probably hear about them anyway.
Well, I should qualify my statements some, because I don't want to
maling _why's work, which is nothing short of amazing, ultimately.
For short configuration items, YAML is rather impressive. It is
human editable, it's rich, and it's reasonably stable.
However, syck -- the YAML parser built into Ruby -- is unusable in
Ruby 1.8.1 (which is still the officially released version of Ruby)
on Windows and has other issues in versions of Ruby up through 1.8.2
preview 2. I have not yet verified whether the problems I had in
1.8.2p2 with syck have been fixed, and I've not been in a position
to verify them lately.
Ruwiki has specific needs that may or may not be present in a
generic application that needs something externally editable. In
particular, Ruwiki files can be large because of the content --
which is a large \n-delimited string. For a Ruwiki file at work and
for the Ruwiki::WikiMarkup pages, syck failed (as in coredump) in
Ruby 1.8.1 on Windows on writing strings longer than about 7k and
reading strings longer than about 4k. I don't think it was a
Windows-only problem. In Ruby 1.8.2p2, syck was confused by \n and
sometimes escaped them when it shouldn't, resulting in unusable
code.
Because of this, and because I needed a human-editable format that
was simple, reasonably quick, and reliable, I created what has since
become Ruwiki::Exportable. This is a general-purpose markup, but it
is not a type-smart format (like YAML), leaving it up to the reader
and the writer to ensure that the data it writes out will be
meaningfully read back in. The format looks something like (this is
actually the default ruwiki.conf):
ruwiki-config!css: ruwiki.css
ruwiki-config!date-format: %Y.%m.%d
ruwiki-config!datetime-format: %Y.%m.%d %H:%M:%S
ruwiki-config!debug: true
ruwiki-config!default-page: ProjectIndex
ruwiki-config!default-project: Default
ruwiki-config!language: en
ruwiki-config!storage-options: flatfiles!data-path: ./data
flatfiles!extension: ruwiki
ruwiki-config!storage-type: flatfiles
ruwiki-config!template-path: ./templates/
ruwiki-config!template-set: default
ruwiki-config!time-format: %H:%M:%S
ruwiki-config!title: Ruwiki
ruwiki-config!webmaster: webmaster@domain.tld
webrick-config!addresses:
webrick-config!do-log: true
webrick-config!log-dest: <STDERR>
webrick-config!mount: /
webrick-config!port: 8808
webrick-config!threads: 1
When read, this will look like (in Ruby):
{ 'ruwiki-config' =>
{ 'css' => 'ruwiki.css',
'date-format' => "%Y.%m.%d",
#...
},
'webrick-config' =>
{ 'addresses' => "",
#...
}
}
I have to know that webrick-config!addresses resulting in ""
actually means []. I have to know in a .ruwiki file that
"properties!edit-date: 1101267172" actually means "Tue Nov 23
22:32:52 Eastern Standard Time 2004" because it's Time.now.to_i.
I also have to know that the value listed for "storage-options" is
actually a nested Ruwiki::Exportable document. The formal definition
for a Ruwiki::Exportable document is:
<group-id>!<item-id>:<1whitespace>VALUE
[<1whitespace>CONTINUED VALUE]*
That is, if I want to continue a value, I simply continue it by not
specifying another <group-id> at the beginning and indenting the
embedded value with ONE whitespace -- either a tab or a space (it
was originally only tabs, which is still what Ruwiki::Exportable
uses by default on export to string).
YAML knows some of the type stuff -- and most of the problems I have
seen are when syck is wrong on that guess or otherwise mangles it.
I'm actually quite pleased with Ruwiki::Exportable -- it's
reasonably fast and simple to understand, but it is one level
separated from the actual object (it uses a hash as the canonical
format for the data, not the object itself, as Marshal and YAML do).
I prefer YAML to XML for most editable configuration files, but YAML
has its own issues.
-austin
> On Nov 23, 2004, at 4:49 PM, Austin Ziegler wrote:
>> On Wed, 24 Nov 2004 02:35:21 +0900, David Heinemeier Hansson
>> <david@loudthinking.com> wrote:
>>> * Switching from Marshal to YAML outputs. The biggest fear in
>>> Instiki is that something horrible goes wrong with the
>>> persisted data and all is lost in a big ball of Marshalled,
>>> binary mud. Having a humanly readable format would be very
>>> nice. This needs A LOT of testing, though.
>> I recommend *not* using YAML. In my experiences with Ruwiki, YAML
>> support is not yet stable enough -- especially for large data --
>> for general use.
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca