[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

File.yaml?(fname

Trans

12/9/2006 5:28:00 PM

what's the best way to determine if a file is yaml?

thanks,
t.


5 Answers

Devin Mullins

12/9/2006 5:46:00 PM

0

Trans wrote:
> what's the best way to determine if a file is yaml?
Naive answer:

def File.yaml?(fname)
YAML.load(IO.read(fname))
true
rescue ArgumentError
false
end

Though, open up irb -ryaml and keep running this line:
YAML.load Array.new(60){rand 256}.pack('c*')

I'm not sure that's what you're after. :)

And I'm guessing you didn't mean:
def File.yaml?(fname)
extname(fname) =~ /^ya?ml$/
end

Devin

Paul Lutus

12/9/2006 5:51:00 PM

0

Trans wrote:

> what's the best way to determine if a file is yaml?

Process the file using a parser meant to process YAML. If the parse fails,
it means:

1. The file isn't YAML.

2. The chosen parser is not robust enough to process this specific, valid
YAML file.

3. The YAML file, although more or less valid YAML, has syntax errors not
consistent with the formal YAML specification.

4. The YAML specification contains ambiguities that allow a valid parser to
fail on valid YAML syntax.

5. Other.

In other words, you cannot really say, absolutely and unambiguously, that a
particular file is a YAML file.

--
Paul Lutus
http://www.ara...

Joel VanderWerf

12/9/2006 8:05:00 PM

0

Trans wrote:
> what's the best way to determine if a file is yaml?

In light of the other responses, which show how hard it is to do this in
general, what about a pragmatic approach that might work in most of the
cases you are interested in?

Look at the first N lines.

If any line has _any_ non-printing characters, it's not correct YAML and
wasn't generated by YAML#dump.[1]

If any are longer than M chars or other binary file heuristics apply[2],
it's probably not a manually written YAML file.

If it passes at least _one_ of these two checks, then check to see if
80% of the (first N) lines match the following:

/^\s*(-|\?|[\w\s]*:)\s/

Maybe add some logic to skip blocks of text like this (so they don't
count against the 80%):

a: |
skip
me

Also, check for > in place of |.

And also skip blanks and comments /^\s*(#|$)/.

And then finally load it and rescue any ArgumentError.

There are probably a lot of corner cases that kill this approach if you
cannot tolerate false negatives (i.e., legit yaml that gets rejected by
the above).

---

[1] The YAML spec, http://yaml.org/spec/cu..., says nonprinting
chars are encoded (see 4.1.1. Character Set), and it seems to be true,
at least in the dump output:

irb(main):023:0> puts({"a"=>"\002"}.to_yaml)
---
a: !binary |
Ag==

However, YAML can load unescaped binary data, as Devin showed:

irb(main):025:0> YAML.load "a: \002"
=> {"a"=>"\002"}

[2] For example,
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-...

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Ara.T.Howard

12/9/2006 8:06:00 PM

0

Trans

12/9/2006 11:01:00 PM

0


Joel VanderWerf wrote:
> Trans wrote:
> > what's the best way to determine if a file is yaml?
>
> In light of the other responses, which show how hard it is to do this in
> general, what about a pragmatic approach that might work in most of the
> cases you are interested in?
>
> Look at the first N lines.
>
> If any line has _any_ non-printing characters, it's not correct YAML and
> wasn't generated by YAML#dump.[1]
>
> If any are longer than M chars or other binary file heuristics apply[2],
> it's probably not a manually written YAML file.
>
> If it passes at least _one_ of these two checks, then check to see if
> 80% of the (first N) lines match the following:
>
> /^\s*(-|\?|[\w\s]*:)\s/
>
> Maybe add some logic to skip blocks of text like this (so they don't
> count against the 80%):
>
> a: |
> skip
> me
>
> Also, check for > in place of |.
>
> And also skip blanks and comments /^\s*(#|$)/.
>
> And then finally load it and rescue any ArgumentError.
>
> There are probably a lot of corner cases that kill this approach if you
> cannot tolerate false negatives (i.e., legit yaml that gets rejected by
> the above).

yikes! if that's what it takes then i must run away! :-) i need
something snappy. actually it just occured to me that as of YAML 1.1
the document declaration is mandetory. I had forgotten about that. So
checking for an initial line starting with %YAML would do the trick as
long as docs where 1.1 compliant --at least in this regard.
Unfortuantely Syck itself isn't 1.1 compliant in this respect
whatsoever :-(

In the mean time I'm just going to go with ara's suggestion. the use of
an initial '---' is an acceptable requirment for my needs.

t.