[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

memoize to a file

Brian Buckley

2/1/2006 3:33:00 AM

Hello all,

Using Memoize gem 1.2.0, memoizing TO a file appears to be working for me
but subsequently reading that file (say, by rerunning the same script)
appears NOT to be working (the fib(n) calls are being run again).
Inspecting the Memoize module I changed the line

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.read(file)))

and it instead of silently failing I now see the error message: "in `load':
marshal data too short (ArgumentError)"

My questions:
1 What is causing this error? (possibly Windows related?)
2 What is the purpose of the rescue{} suppressing the error info in the
first place?
3 Instead of using Marshall would using yaml be a reasonable alternative?
(I am thinking of readability of the cache file and also capability to
pre-populate it)

Thanks.

-- Brian Buckley

------------------------------------------
require 'memoize'
include Memoize
def fib(n)
puts "running... n is #{n}"
return n if n < 2
fib(n-1) + fib(n-2)
end
h = memoize(:fib,"fib.cache")
puts fib(10)
13 Answers

Daniel Berger

2/1/2006 3:42:00 AM

0

Brian Buckley wrote:
> Hello all,
>
> Using Memoize gem 1.2.0, memoizing TO a file appears to be working for me
> but subsequently reading that file (say, by rerunning the same script)
> appears NOT to be working (the fib(n) calls are being run again).
> Inspecting the Memoize module I changed the line
>
> cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
> to
> cache = Hash.new.update(Marshal.load(File.read(file)))
>
> and it instead of silently failing I now see the error message: "in `load':
> marshal data too short (ArgumentError)"
>
> My questions:
> 1 What is causing this error? (possibly Windows related?)

That is odd. I've run it on Windows with no trouble in the past. Is
it possible you ran this program using 1.8.2, downloaded 1.8.4, then
re-ran the same code using the same cache? It would fail with that
error if such is the case, since Marshal is not compatible between
versions of Ruby - not even minor versions.

> 2 What is the purpose of the rescue{} suppressing the error info in the
> first place?

The assumption (whoops!) was that if Hash.new.update failed it was
because there was no cache (i.e. first run), so just return an empty
hash.

> 3 Instead of using Marshall would using yaml be a reasonable alternative?
> (I am thinking of readability of the cache file and also capability to
> pre-populate it)

It will be slower, but it would work.

Regards,

Dan

Logan Capaldo

2/1/2006 3:51:00 AM

0


On Jan 31, 2006, at 10:32 PM, Brian Buckley wrote:

> Hello all,
>
> Using Memoize gem 1.2.0, memoizing TO a file appears to be working
> for me
> but subsequently reading that file (say, by rerunning the same script)
> appears NOT to be working (the fib(n) calls are being run again).
> Inspecting the Memoize module I changed the line
>
> cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
> to
> cache = Hash.new.update(Marshal.load(File.read(file)))
>
> and it instead of silently failing I now see the error message: "in
> `load':
> marshal data too short (ArgumentError)"
>
> My questions:
> 1 What is causing this error? (possibly Windows related?)
> 2 What is the purpose of the rescue{} suppressing the error info
> in the
> first place?
> 3 Instead of using Marshall would using yaml be a reasonable
> alternative?
> (I am thinking of readability of the cache file and also capability to
> pre-populate it)
>
> Thanks.
>
> -- Brian Buckley
>
> ------------------------------------------
> require 'memoize'
> include Memoize
> def fib(n)
> puts "running... n is #{n}"
> return n if n < 2
> fib(n-1) + fib(n-2)
> end
> h = memoize(:fib,"fib.cache")
> puts fib(10)

Basically it's using exceptions as flow control:

begin
cache = Hash.new.update(Marshal.load(File.read(file)))
rescue
cache = {} # empty hash
end

So for whatever reason, if loading the file fails (eg, this is the
first time the program has been run) it just starts with an empty
cache. I don't know why its failing to read the file.


Timothy Goddard

2/1/2006 4:18:00 AM

0

Just a thought, but you might like to load this file using the binary
option on Windows. Marshall uses a binary format and Windows does wierd
things to binary files loaded without the binary option.

Mauricio Fernández

2/1/2006 8:21:00 AM

0

On Wed, Feb 01, 2006 at 12:32:57PM +0900, Brian Buckley wrote:
> My questions:
> 1 What is causing this error? (possibly Windows related?)

IIRC File.read(file) doesn't open the file in binary mode; try
File.open(file, "rb"){|f| f.read}

> 2 What is the purpose of the rescue{} suppressing the error info in the
> first place?

setting cache to {} if Marshal.load fails for some reason (e.g. a major
change in the Marshal format across Ruby versions).

> 3 Instead of using Marshall would using yaml be a reasonable alternative?
> (I am thinking of readability of the cache file and also capability to
> pre-populate it)

I wouldn't do that:
* Marshal is faster than Syck (especially when dumping data)
* YAML takes more space than Marshal'ed data
* there are still more bugs in Syck than in Marshal (the nastiest memory
issues are believed to be fixed, but there is still occasional data
corruption)
* Marshal is more stable across Ruby releases

As for editing the cache, you can always do
File.open("cache.yaml", "w") do |out|
YAML.dump(Marshal.load(File.open("cache", "rb"){|f| f.read}), out)
end

--
Mauricio Fernandez


Robert Klemme

2/1/2006 8:52:00 AM

0

Daniel Berger wrote:
> Brian Buckley wrote:
>> Hello all,
>>
>> Using Memoize gem 1.2.0, memoizing TO a file appears to be working
>> for me but subsequently reading that file (say, by rerunning the
>> same script) appears NOT to be working (the fib(n) calls are being
>> run again). Inspecting the Memoize module I changed the line
>>
>> cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
>> to
>> cache = Hash.new.update(Marshal.load(File.read(file)))
>>
>> and it instead of silently failing I now see the error message: "in
>> `load': marshal data too short (ArgumentError)"
>>
>> My questions:
>> 1 What is causing this error? (possibly Windows related?)
>
> That is odd. I've run it on Windows with no trouble in the past. Is
> it possible you ran this program using 1.8.2, downloaded 1.8.4, then
> re-ran the same code using the same cache? It would fail with that
> error if such is the case, since Marshal is not compatible between
> versions of Ruby - not even minor versions.
>
>> 2 What is the purpose of the rescue{} suppressing the error info in
>> the first place?
>
> The assumption (whoops!) was that if Hash.new.update failed it was
> because there was no cache (i.e. first run), so just return an empty
> hash.
>
>> 3 Instead of using Marshall would using yaml be a reasonable
>> alternative? (I am thinking of readability of the cache file and
>> also capability to pre-populate it)
>
> It will be slower, but it would work.

As you and others have pointed out this is lilely a problem caused by not
opening the file in binary mode. IMHO lib code that uses Marshal should
ensure to open files in binary mode (regardless of platform). Advantages
are twofold: we won't see these kind of erros (i.e. it's cross platform)
and documentation (you know from reading the code that the file is
expected to contain binary data).

Also, the line looks a bit strange to me. Creating a new hash and
updating it with a hash read from disk seems superfluous. I'd rather do
something like this:

cache = File.open(file, "rb") {|io| Marshal.load(io)} rescue {}

Marshal.load and Marshal.dump can actually read from and write to an IO
object. This seems most efficient because the file contents do not have
read into mem before demarshalling and it's fail safe the same way as the
old impl.

Kind regards

robert



Brian Buckley

2/1/2006 1:00:00 PM

0

> > 1 What is causing this error? (possibly Windows related?)
>
> IIRC File.read(file) doesn't open the file in binary mode; try
> File.open(file, "rb"){|f| f.read}


Perfect. Changing

cache = Hash.new.update(Marshal.load(File.read(file))) rescue { }
to
cache = Hash.new.update(Marshal.load(File.open(file, "rb"){|f| f.read}))
rescue { }

and it works. Should this edit go into the gem (Daniel if you're
listening)?


> 2 What is the purpose of the rescue{} suppressing the error info in the
> > first place?
>
> setting cache to {} if Marshal.load fails for some reason (e.g. a major
> change in the Marshal format across Ruby versions).
>

Got it. The error supression here is just about always the correct way to
handle the situation.

As for editing the cache, you can always do
> File.open("cache.yaml", "w") do |out|
> YAML.dump(Marshal.load(File.open("cache", "rb"){|f| f.read}), out)
> end


Ahhh. Populate that Marshal formatted file using YAML. Good thought.

Ara.T.Howard

2/1/2006 3:32:00 PM

0

James Gray

2/1/2006 3:36:00 PM

0

On Feb 1, 2006, at 9:31 AM, ara.t.howard@noaa.gov wrote:

> why not pstore - it's done all that already and is built-in?

PStore is just a wrapper on top of Marshal for transactional file
storage. If you need transactions, it's great. Otherwise, you might
as well just use Marshal.

James Edward Gray II


Ara.T.Howard

2/1/2006 3:57:00 PM

0

James Gray

2/1/2006 4:10:00 PM

0

On Feb 1, 2006, at 9:56 AM, ara.t.howard@noaa.gov wrote:

> On Thu, 2 Feb 2006, James Edward Gray II wrote:
>
>> On Feb 1, 2006, at 9:31 AM, ara.t.howard@noaa.gov wrote:
>>
>>> why not pstore - it's done all that already and is built-in?
>>
>> PStore is just a wrapper on top of Marshal for transactional file
>> storage. If you need transactions, it's great. Otherwise, you
>> might as well just use Marshal.
>
> it's not quite only that. it also
>
> - does some simple checks when creating the file (readability, etc)
> - allows db usage to be multi-processed
> - supports deletion
> - rolls backs writes on exceptions / commits using ensure to
> avoid corrupt
> data file
> - handles read vs write actions using shared/excl locks to boost
> concurrency
> - uses md5 check to avoid un-needed writes
> - opens in correct modes for all platforms

These are all great points. Thanks for the lesson. ;)

James Edward Gray II