Trans
12/9/2006 11:01:00 PM
Joel VanderWerf wrote:
> Trans wrote:
> > what's the best way to determine if a file is yaml?
>
> In light of the other responses, which show how hard it is to do this in
> general, what about a pragmatic approach that might work in most of the
> cases you are interested in?
>
> Look at the first N lines.
>
> If any line has _any_ non-printing characters, it's not correct YAML and
> wasn't generated by YAML#dump.[1]
>
> If any are longer than M chars or other binary file heuristics apply[2],
> it's probably not a manually written YAML file.
>
> If it passes at least _one_ of these two checks, then check to see if
> 80% of the (first N) lines match the following:
>
> /^\s*(-|\?|[\w\s]*:)\s/
>
> Maybe add some logic to skip blocks of text like this (so they don't
> count against the 80%):
>
> a: |
> skip
> me
>
> Also, check for > in place of |.
>
> And also skip blanks and comments /^\s*(#|$)/.
>
> And then finally load it and rescue any ArgumentError.
>
> There are probably a lot of corner cases that kill this approach if you
> cannot tolerate false negatives (i.e., legit yaml that gets rejected by
> the above).
yikes! if that's what it takes then i must run away! :-) i need
something snappy. actually it just occured to me that as of YAML 1.1
the document declaration is mandetory. I had forgotten about that. So
checking for an initial line starting with %YAML would do the trick as
long as docs where 1.1 compliant --at least in this regard.
Unfortuantely Syck itself isn't 1.1 compliant in this respect
whatsoever :-(
In the mean time I'm just going to go with ara's suggestion. the use of
an initial '---' is an acceptable requirment for my needs.
t.