Asp Forum - LazyLoad - comp.lang.ruby

Erik Veenstra

1/24/2006 1:21:00 PM

Imagine, you're building a CVS like repository. A repository
has modules, a module has branches and a branch has _a lot of_
items (history!). This repository is handled by a standalone
server with the FTP protocol as "frontend".

In a naive implementation, you simply load all items (except
the contents of the items) in memory, so they are readily
available. In pure OO modeling theories, this is usually true:
"All objects live in core.".

In reality, you don't want to do that. You only want to load
all items of a branch if they are referred to (and optionally
unload it after a while). So we introduce Branch@loaded and
move the invocation of Branch#load from Branch#initialize to
each place in the code where Branch@items is used. *Each*
place, don't forget even one place! Sooner or later, you'll
forget one! It's too tricky... And bad coding...

So I came up with this LazyLoad, a generic lazy-loading class.
We can initialize Branch@items to LazyLoad.new(self, :load,
:items) instead of Hash.new. Whenever this object is referred
to (e.g. with @items.keys), LazyLoad#method_missing is invoked.
This method invokes Branch#load, gets the object Branch@items
(which now refers to a filled Hash) and sends the original
message to this Branch@items. This instance of LazyLoad now
dies in peace.

I implemented LazyLoad (see below) and use it in a real
situation. Seems to work. The server starts really fast and the
user thinks that all branches are loaded.

I embedded the backend in the commandline tool as well. If you
use this commandline tool to synchronize the local workset with
the repository, you usually want to load only *one* branch, not
all of them. The speed benefit is huge, whereas the impact on
the code is close to zero!

The code below demonstrates this theory: Step 1 is the naive
implementation of Branch, step2 is the enhanced implementation
of Branch and step3 implements LazyLoad itself. (Steps 1 and 2
are just examples of the use of LazyLoad. They are not
complete.)

Comments? Ideas? Something I overlooked?

gegroet,
Erik V. - http://www.erikve...

----------------------------------------------------------------

# STEP 1, NAIVE IMPLEMENTATION

class Branch
def initialize
@items = {}

load
end

def load
@items = {}

# Fill @items... EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!
end
end

----------------------------------------------------------------

# STEP 2, INTRODUCING LAZYLOAD

class Branch
def initialize
@items = LazyLoad.new(self, :load, :items)
end

def load
@items = {}

# Fill @items... EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!
end
end

----------------------------------------------------------------

# STEP 3, IMPLEMENTATION OF LAZYLOAD

class LazyLoad
def initialize(object, load_method, property)
@object = object
@property = property
@load_method = load_method
end

def method_missing(method_name, *parms, &block)
@object.send(@load_method)
@object.instance_eval("@#{@property.to_s}").send(method_name,
*parms, &block)
end
end

----------------------------------------------------------------

23 Answers

Dave Burt

1/24/2006 2:17:00 PM

Erik Veenstra wrote:
> ...
> Comments? Ideas? Something I overlooked?
> ...
> class LazyLoad
> def initialize(object, load_method, property)
> @object = object
> @property = property
> @load_method = load_method
> end
>
> def method_missing(method_name, *parms, &block)
> @object.send(@load_method)
> @object.instance_eval("@#{@property.to_s}").send(method_name,
> *parms, &block)
> end
> end

Interesting. Here's a potential problem. You won't hit method_missing for
methods that Object has. In particular, a client might use dup, ==, to_s,
class, kind_of?, and so on. You might try getting around some of this by
using Delegate from the standard library, or Facets' BlankSlate IIRC.

I was thinking three parameters was too many for LazyLoad#initialize (just
provide a block which loads and returns the loaded value), and started to
rewrite it, when I remembered MenTaLguY's lazy.rb, which does something very
similar.

STEP 4, INTRODUCING lazy.rb

class Branch
def initialize
@items = promise do
@items = {}
# Fill @items... EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!
@items
end
end
end

Cheers,
Dave

Erik Veenstra

1/24/2006 4:07:00 PM

> Interesting. Here's a potential problem. You won't hit
> method_missing for methods that Object has. In particular, a
> client might use dup, ==, to_s, class, kind_of?, and so on.
> You might try getting around some of this by using Delegate
> from the standard library, or Facets' BlankSlate IIRC.

I was aware of this problem and was already working it out. My
solution is to overwrite all (except a few) already defined
methods, so they call method_missing.

> I was thinking three parameters was too many for
> LazyLoad#initialize (just provide a block which loads and
> returns the loaded value), and started to rewrite it, when I
> remembered MenTaLguY's lazy.rb, which does something very
> similar.

Right... Maybe using a block was just to obvious...

New versions below.

(I keep the method Branch#load, because it not only fills
Branch@items, but Branch@snapshots as well.)

Thanks.

More comments? More ideas? More things I overlooked?

gegroet,
Erik V. - http://www.erikve...

----------------------------------------------------------------

class LazyLoad
instance_methods.each do |method_name|
unless ["__send__", "__id__"].include?(method_name)
class_eval <<-"EOF"
def #{method_name}(*parms, &block)
method_missing(:#{method_name}, *parms, &block)
end
EOF
end
end

def initialize(&block)
@block = block
end

def method_missing(method_name, *parms, &block)
@block.call.send(method_name, *parms, &block)
end
end

----------------------------------------------------------------

class Branch
def initialize
@items = LazyLoad.new{load; @items}
@snapshots = LazyLoad.new{load; @snapshots}
end

def load
@items = {}
@snapshots = {}

# Fill @items and @snapshots...
# EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!
end
end

----------------------------------------------------------------

Erik Veenstra

1/24/2006 4:40:00 PM

As a side effect of the last improvement, the attr_reader now
works too.

gegroet,
Erik V. - http://www.erikve...

----------------------------------------------------------------

require "lazyload"

class Thing
attr_reader :prop1
attr_reader :prop2
attr_writer :prop2

def initialize
@prop1 = LazyLoad.new{:it_works}
@prop2 = LazyLoad.new{:nothing}
end
end

thing = Thing.new

thing.prop2 = :this_too

p thing.prop1
p thing.prop2

----------------------------------------------------------------

MenTaLguY

1/24/2006 9:54:00 PM

Quoting Erik Veenstra <google@erikveen.dds.nl>:

> instance_methods.each do |method_name|
> unless ["__send__", "__id__"].include?(method_name)
> class_eval <<-"EOF"
> def #{method_name}(*parms, &block)
> method_missing(:#{method_name}, *parms, &block)
> end
> EOF
> end
> end

I've seen a lot of different ways of writing this; my personal
favorite is:

instance_methods.each { |m| undef_method m unless m =~ /^__/ }

(I think I picked this up from someone on ruby-talk)

> def method_missing(method_name, *parms, &block)
> @block.call.send(method_name, *parms, &block)
> end

One thing to be careful of -- if the block can't successfully
replace all references to the LazyLoad instance with the result
object, you will see really bizzare behavior as
each new method call on the LazyLoad reruns the (potentially
expensive) computation, only to be routed to a completely different
object.

To get around that problem, you'd want to remember the block's
result after calling it the first time (as lazy.rb does), rather
than calling it multiple times.

> class Branch
> def initialize
> @items = LazyLoad.new{load; @items}
> @snapshots = LazyLoad.new{load; @snapshots}
> end
>
> def load
> @items = {}
> @snapshots = {}
>
> # Fill @items and @snapshots...
> # EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!
> end
> end

You can do this with lazy.rb too:

class Branch
def initialize
@items = promise{load; @items}
@snapshots = promise{load; @snapshots}
end

def load
@items = {}
@snapshots = {}

# Fill @items and @snapshots...
# EXPENSIVE, TIME CONSUMING, MEMORY HUNGRY!
end
end

Since you mentioned you were using this in production, you really
might want to look into using lazy.rb instead.

While discovering the issues for yourself can be a valuable learning
experience, I'm not sure you want to be doing that in production
code...

-mental

MenTaLguY

1/24/2006 9:56:00 PM

Quoting Erik Veenstra <google@erikveen.dds.nl>:

> def initialize
> @prop1 = LazyLoad.new{:it_works}
> @prop2 = LazyLoad.new{:nothing}
> end

Ah, careful. Every method call on @prop1 or @prop2 would rerun the
block. You won't notice any problems if all the block does is
return a symbol, but for anything more complex ... well, see my
last email.

-mental

MenTaLguY

1/24/2006 10:05:00 PM

Quoting Erik Veenstra <google@erikveen.dds.nl>:

> @items = LazyLoad.new{load; @items}
> @snapshots = LazyLoad.new{load; @snapshots}

I forgot to mention. I think you've hit on a nice idiom for using
lazy evaluation in Ruby here.

In simpler cases, it could look something like:

@blah = promise { @blah = expensive_computation }

Good find!

-mental

Erik Veenstra

1/24/2006 11:20:00 PM

> > def initialize
> > @prop1 = LazyLoad.new{:it_works}
> > @prop2 = LazyLoad.new{:nothing}
> > end
>
> Ah, careful. Every method call on @prop1 or @prop2 would
> rerun the block. You won't notice any problems if all the
> block does is return a symbol, but for anything more complex
> ... well, see my last email.

Yeah, true. Bad example. The original code (the branches) uses
a common method "load" which sets both properties. If one of
them is used, the other gets set as well. No problem there.

gegroet,
Erik V. - http://www.erikve...

Erik Veenstra

1/24/2006 11:42:00 PM

> To get around that problem, you'd want to remember the
> block's result after calling it the first time (as lazy.rb
> does), rather than calling it multiple times.

I added some things to make it thread safe (see below). One of
them was @real_object. That's what you mean, I think.

> You can do this with lazy.rb too:

I've never seen lazy.rb before, but by now the packages is
already on my desktop. Needs some investigation. Although doing
it yourself is a good learning experience indeed. :-)

> Since you mentioned you were using this in production, you
> really might want to look into using lazy.rb instead.

My "real situation" is a shadow repository for CVS. It's not
too bad when it dies. Running the conversion job will refill
the DB. :-)

Thanks.

gegroet,
Erik V. - http://www.erikve...

----------------------------------------------------------------

require "thread"

class LazyLoad
instance_methods.each do |method_name|
unless ["__send__", "__id__"].include?(method_name)
class_eval <<-"EOF"
def #{method_name}(*parms, &block)
method_missing(:#{method_name}, *parms, &block)
end
EOF
end
end

def initialize(*parms, &block)
@parms = parms
@block = block
@mutex = Mutex.new
@evaluated = false
@real_object = nil
end

def method_missing(method_name, *parms, &block)
@mutex.synchronize do
@real_object = @block.call(*@parms) unless @evaluated
@evaluated = true
end

@real_object.send(method_name, *parms, &block)
end
end

module Kernel
def lazy(*parms, &block)
LazyLoad.new(*parms, &block)
end
end

----------------------------------------------------------------

MenTaLguY

1/24/2006 11:50:00 PM

Quoting Erik Veenstra <google@erikveen.dds.nl>:

> Yeah, true. Bad example. The original code (the branches) uses
> a common method "load" which sets both properties. If one of
> them is used, the other gets set as well. No problem there.

Actually there is a problem the moment such an instance variable is
assigned to another variable or passed as a parameter. The
LazyLoad instance escapes the block's control.

class Example
def initialize
@prop = LazyLoad.new { @prop = [1, 2, 3] }
end
def show3
helper( @prop )
end
def helper( arr )
p arr.pop
p arr.pop
p arr.pop
end
end

ex = Example.new
ex.show3

What would you expect this program to output? What actually
happens?

-mental

Erik Veenstra

1/24/2006 11:56:00 PM

It does work with the thread safe version I posted a couple of
minutes ago... :o)

gegroet,
Erik V. - http://www.erikve...

comp.lang.ruby

LazyLoad

Erik Veenstra

Dave Burt

Erik Veenstra

Erik Veenstra

MenTaLguY

MenTaLguY

MenTaLguY

Erik Veenstra

Erik Veenstra

MenTaLguY

Erik Veenstra

x Login to ForumsZone