Asp Forum - [SUMMARY] SerializableProc (#38

James Gray

7/14/2005 12:51:00 PM

The solutions this time show some interesting differences in approach, so I want
to walk through a handful of them below. The very first solution was from Robin
Stocker and that's a fine place to start. Here's the class:

class SerializableProc

def initialize( block )
@block = block
# Test if block is valid.
to_proc
end

def to_proc
# Raises exception if block isn't valid, e.g. SyntaxError.
eval "Proc.new{ #{@block} }"
end

def method_missing( *args )
to_proc.send( *args )
end

end

It can't get much simpler than that. The main idea here, and in all the
solutions, is that we need to capture the source of the Proc. The source is
just a String so we can serialize that with ease and we can always create a new
Proc if we have the source. In other words, Robin's main idea is to go
(syntactically) from this:

Proc.new {
puts "Hello world!"
}

To this:

SerializableProc.new %q{
puts "Hello world!"
}

In the first pure Ruby version we're building a Proc with the block of code to
define the body. In the second SerializableProc version, we're just passing a
String to the constructor that can be used to build a block. Christian
Neukirchen had something very interesting to say about the change:

Obvious problems of this approach are the lack of closures and editor
support (depending on the inverse quality of your editor :P)...

We'll get back to the lack of closures issue later, but I found the "inverse
quality of your editor" claim interesting. The meaning is that a poor editor
may not consider %q{...} equivalent to '...'. If it doesn't realize a String is
being entered, it may continue to syntax highlight the code inside. Of course,
you could always remove the %q whenever you want to see the code highlighting,
but that's tedious.

Getting back to Robin's class, initialize() just stores the String and creates a
Proc from it so an Exception will be thrown at construction time if fed invalid
code. The method to_proc() is what builds the Proc object by wrapping the
String in "Proc.new { ... }" and calling eval(). Finally, method missing makes
SerializableProc behave close to a Proc. Anytime it sees a method call that
isn't initialize() or to_proc(), it creates a Proc object and forwards the
message.

We don't see anything specific to Serialization in Robin's code, because both
Marshal (PStore uses Marshal) and YAML can handle a custom class with String
instance data. Like magic, it all just works.

Robin had a complaint though:

I imagine my solution is not very fast, as each time a method on the
SerializableProc is called, a new Proc object is created.

The object could be saved in an instance variable @proc so that speed is
only low on the first execution. But that would require the definition of
custom dump methods for each Dumper so that it would not attempt to dump
@proc.

My own solution (and others), do cache the Proc and define some custom dump
methods. Let's have a look at how something like that comes out:

class SerializableProc
def self._load( proc_string )
new(proc_string)
end

def initialize( proc_string )
@code = proc_string
@proc = nil
end

def _dump( depth )
@code
end

def method_missing( method, *args )
if to_proc.respond_to? method
@proc.send(method, *args)
else
super
end
end

def to_proc( )
return @proc unless @proc.nil?

if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
@proc = eval @code
elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
@proc = eval "lambda #{@code}"
else
@proc = eval "lambda { #{@code} }"
end
end

def to_yaml( )
@proc = nil
super
end
end

My initialize() is the same, save that I create a variable to hold the Proc
object and I wasn't clever enough to trigger the early Exception when the code
is bad. My to_proc() looks scary but I just try to accept a wider range of
Strings, wrapping them in only what they need. The end result is the same.
Note that any Proc created is cached. My method_missing() is also very similar.
If the Proc object responds to the method, it is forwarded. The first line of
method_missing() calls to_proc() to ensure we've created one. After that, it
can safely use the @proc variable.

The _load() class method and _dump() instance method is what it takes to support
Marshal. First, _dump() is expected to return a String that could be used to
rebuild the instance. Then, _load() is passed that String on reload and
expected to return the recreated instance. The String choice is simple in this
case, since we're using the source.

There are multiple ways to support YAML serialization, but I opted for the super
simple cheat. YAML can't serialize a Proc, but it's just a cache that can
always be restored. I just override to_yaml() and clear the cache before
handing serialization back to the default method. My code is unaffected by the
Proc's absence and it will recreate it when needed.

Taking one more step, Dominik Bathon builds the Proc in the constructor and
never has to recreate it:

require "delegate"
require "yaml"

class SProc < DelegateClass(Proc)

attr_reader :proc_src

def initialize(proc_src)
super(eval("Proc.new { #{proc_src} }"))
@proc_src = proc_src
end

def ==(other)
@proc_src == other.proc_src rescue false
end

def inspect
"#<SProc: #{@proc_src.inspect}>"
end
alias :to_s :inspect

def marshal_dump
@proc_src
end

def marshal_load(proc_src)
initialize(proc_src)
end

def to_yaml(opts = {})
YAML::quick_emit(self.object_id, opts) { |out|
out.map("!rubyquiz.com,2005/SProc" ) { |map|
map.add("proc_src", @proc_src)
}
}
end

end

YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
SProc.new(val["proc_src"])
}

Dominik uses the delegate library, instead of the method_missing() trick.
That's a two step process. You can see the first step when SPoc is defined to
inherit from DelegateClass(Proc), which sets a type for the object so delegate
knows which messages to forward. The second step is the first line of the
constructor, which passes the delegate object to the DelegateClass. That's the
instance that will receive forwarded messages. Dominik also defined a custom
==(), "because that doesn't really work with method_missing/delegate."

Dominik's code uses a different interface to support Marshal, but does the same
thing I did, as you can see. The YAML support is different. SProc.to_yaml()
spits out a new YAML type, that basically just emits the source. The code
outside of the class adds the YAML support to read this type back in, whenever
it is encountered. Here's what the class looks like when it's resting in a YAML
file:

!rubyquiz.com,2005/SProc
proc_src: |2-
|*args|
puts "Hello world"
print "Args: "
p args

The advantage here is that the YAML export procedure never touches the Proc so
it doesn't need to be hidden or removed and rebuilt.

Florian's solution is also worth mention, though it takes a completely different
road to solving the problem. Time and space don't allow me to recreate and
annotate the code here, but Florian described the premise well in the submission
message:

I wrote this a while ago and it works by extracting a proc's origin file
name and line number from its .inspect string and using the source code
(which usually does not have to be read from disc) -- it works with
procs generated in IRB, eval() calls and regular files. It does not work
from ruby -e and stuff like "foo".instance_eval "lambda {}".source
probably doesn't work either.

Usage:

code = lambda { puts "Hello World" }
puts code.source
Marshal.load(Marshal.dump(code)).call
YAML.load(code.to_yaml).call

The code itself is a fascinating read. It uses the relatively unknown
SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture that
source, and even implements a partial Ruby parser with standard libraries. I'm
telling you, that code reads like a good mystery novel for programmers. Don't
miss it!

One last point. I said in the quiz all this is just a hack, no matter how
useful it is. Dave Burt sent a message to Ruby talk along these lines:

Proc's documentation tells us that "Proc objects are blocks of code that
have been bound to a set of local variables." (That is, they are "closures"
with "bindings".) Do any of the proposed solutions so far store local
variables?

# That is, can the following Proc be serialized?
local_var = 42
code = proc { local_var += 1 } # <= what should that look like in YAML?
code.call #=> 43

An excellent point. These toys we're creating have serious limitations to be
sure. I assume this is the very reason Ruby's Procs cannot be serialized.
Using binding() might make it possible to work around this problem in some
instances, but there are clearly some Procs that cannot be cleanly serialized.

My thanks to all who committed such wonderful code and discussion to this week's
quiz. I know I learned multiple new things and I hope others did too.

Tomorrow we have a quiz to sample some algorithmic fun...

6 Answers

why the lucky stiff

7/14/2005 3:33:00 PM

Ruby Quiz wrote:

>My thanks to all who committed such wonderful code and discussion to this week's
>quiz. I know I learned multiple new things and I hope others did too.
>
>
Good stuff, JEGII, Robin, Chris2, Dave.

I can also really sympathize with Chris' disgust over the
YAML.add_ruby_type methods... It is undergoing deprecation in favor of:

class SerializableProc
yaml_type "tag:rubyquiz.org,2005:SerializableProc"
end

_why

Christian Neukirchen

7/14/2005 3:54:00 PM

why the lucky stiff <ruby-talk@whytheluckystiff.net> writes:

> Ruby Quiz wrote:
>
>>My thanks to all who committed such wonderful code and discussion to this week's
>>quiz. I know I learned multiple new things and I hope others did too.
>>
>>
> Good stuff, JEGII, Robin, Chris2, Dave.
>
> I can also really sympathize with Chris' disgust over the
> YAML.add_ruby_type methods... It is undergoing deprecation in favor
> of:
>
> class SerializableProc
> yaml_type "tag:rubyquiz.org,2005:SerializableProc"
> end

And then #yaml_dump and #yaml_load? That would rule.

> _why
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneuk...

why the lucky stiff

7/14/2005 3:58:00 PM

Christian Neukirchen wrote:

> And then #yaml_dump and #yaml_load? That would rule.

Class.yaml_new or Object.yaml_initialize. And Object.to_yaml.

If folks prefer the Marshal setup, though, I'll change it. It's only
been like this for a handful of minor releases.

_why

Jeffrey Moss

7/14/2005 4:34:00 PM

Has anybody thought about serialized enclosures? I was thinking of a way to
use enclosures across multiple apache requests, and came to the conclusion
that it was too much trouble. In this case I just use a standard proc object
and it gets re-initialized on each requests and don't serialize it, but I
always thought it would be nice to maintain some sort of persistent state
across requests.

Wouldn't it be possible to write a C extension for serializable closures?

-Jeff

----- Original Message -----
From: "Ruby Quiz" <james@grayproductions.net>
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Thursday, July 14, 2005 6:51 AM
Subject: [SUMMARY] SerializableProc (#38)

> The solutions this time show some interesting differences in approach, so
> I want
> to walk through a handful of them below. The very first solution was from
> Robin
> Stocker and that's a fine place to start. Here's the class:
>
> class SerializableProc
>
> def initialize( block )
> @block = block
> # Test if block is valid.
> to_proc
> end
>
> def to_proc
> # Raises exception if block isn't valid, e.g. SyntaxError.
> eval "Proc.new{ #{@block} }"
> end
>
> def method_missing( *args )
> to_proc.send( *args )
> end
>
> end
>
> It can't get much simpler than that. The main idea here, and in all the
> solutions, is that we need to capture the source of the Proc. The source
> is
> just a String so we can serialize that with ease and we can always create
> a new
> Proc if we have the source. In other words, Robin's main idea is to go
> (syntactically) from this:
>
> Proc.new {
> puts "Hello world!"
> }
>
> To this:
>
> SerializableProc.new %q{
> puts "Hello world!"
> }
>
> In the first pure Ruby version we're building a Proc with the block of
> code to
> define the body. In the second SerializableProc version, we're just
> passing a
> String to the constructor that can be used to build a block. Christian
> Neukirchen had something very interesting to say about the change:
>
> Obvious problems of this approach are the lack of closures and editor
> support (depending on the inverse quality of your editor :P)...
>
> We'll get back to the lack of closures issue later, but I found the
> "inverse
> quality of your editor" claim interesting. The meaning is that a poor
> editor
> may not consider %q{...} equivalent to '...'. If it doesn't realize a
> String is
> being entered, it may continue to syntax highlight the code inside. Of
> course,
> you could always remove the %q whenever you want to see the code
> highlighting,
> but that's tedious.
>
> Getting back to Robin's class, initialize() just stores the String and
> creates a
> Proc from it so an Exception will be thrown at construction time if fed
> invalid
> code. The method to_proc() is what builds the Proc object by wrapping the
> String in "Proc.new { ... }" and calling eval(). Finally, method missing
> makes
> SerializableProc behave close to a Proc. Anytime it sees a method call
> that
> isn't initialize() or to_proc(), it creates a Proc object and forwards the
> message.
>
> We don't see anything specific to Serialization in Robin's code, because
> both
> Marshal (PStore uses Marshal) and YAML can handle a custom class with
> String
> instance data. Like magic, it all just works.
>
> Robin had a complaint though:
>
> I imagine my solution is not very fast, as each time a method on the
> SerializableProc is called, a new Proc object is created.
>
> The object could be saved in an instance variable @proc so that speed is
> only low on the first execution. But that would require the definition of
> custom dump methods for each Dumper so that it would not attempt to dump
> @proc.
>
> My own solution (and others), do cache the Proc and define some custom
> dump
> methods. Let's have a look at how something like that comes out:
>
> class SerializableProc
> def self._load( proc_string )
> new(proc_string)
> end
>
> def initialize( proc_string )
> @code = proc_string
> @proc = nil
> end
>
> def _dump( depth )
> @code
> end
>
> def method_missing( method, *args )
> if to_proc.respond_to? method
> @proc.send(method, *args)
> else
> super
> end
> end
>
> def to_proc( )
> return @proc unless @proc.nil?
>
> if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
> @proc = eval @code
> elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
> @proc = eval "lambda #{@code}"
> else
> @proc = eval "lambda { #{@code} }"
> end
> end
>
> def to_yaml( )
> @proc = nil
> super
> end
> end
>
> My initialize() is the same, save that I create a variable to hold the
> Proc
> object and I wasn't clever enough to trigger the early Exception when the
> code
> is bad. My to_proc() looks scary but I just try to accept a wider range
> of
> Strings, wrapping them in only what they need. The end result is the
> same.
> Note that any Proc created is cached. My method_missing() is also very
> similar.
> If the Proc object responds to the method, it is forwarded. The first
> line of
> method_missing() calls to_proc() to ensure we've created one. After that,
> it
> can safely use the @proc variable.
>
> The _load() class method and _dump() instance method is what it takes to
> support
> Marshal. First, _dump() is expected to return a String that could be used
> to
> rebuild the instance. Then, _load() is passed that String on reload and
> expected to return the recreated instance. The String choice is simple in
> this
> case, since we're using the source.
>
> There are multiple ways to support YAML serialization, but I opted for the
> super
> simple cheat. YAML can't serialize a Proc, but it's just a cache that can
> always be restored. I just override to_yaml() and clear the cache before
> handing serialization back to the default method. My code is unaffected
> by the
> Proc's absence and it will recreate it when needed.
>
> Taking one more step, Dominik Bathon builds the Proc in the constructor
> and
> never has to recreate it:
>
> require "delegate"
> require "yaml"
>
> class SProc < DelegateClass(Proc)
>
> attr_reader :proc_src
>
> def initialize(proc_src)
> super(eval("Proc.new { #{proc_src} }"))
> @proc_src = proc_src
> end
>
> def ==(other)
> @proc_src == other.proc_src rescue false
> end
>
> def inspect
> "#<SProc: #{@proc_src.inspect}>"
> end
> alias :to_s :inspect
>
> def marshal_dump
> @proc_src
> end
>
> def marshal_load(proc_src)
> initialize(proc_src)
> end
>
> def to_yaml(opts = {})
> YAML::quick_emit(self.object_id, opts) { |out|
> out.map("!rubyquiz.com,2005/SProc" ) { |map|
> map.add("proc_src", @proc_src)
> }
> }
> end
>
> end
>
> YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
> SProc.new(val["proc_src"])
> }
>
> Dominik uses the delegate library, instead of the method_missing() trick.
> That's a two step process. You can see the first step when SPoc is
> defined to
> inherit from DelegateClass(Proc), which sets a type for the object so
> delegate
> knows which messages to forward. The second step is the first line of the
> constructor, which passes the delegate object to the DelegateClass.
> That's the
> instance that will receive forwarded messages. Dominik also defined a
> custom
> ==(), "because that doesn't really work with method_missing/delegate."
>
> Dominik's code uses a different interface to support Marshal, but does the
> same
> thing I did, as you can see. The YAML support is different.
> SProc.to_yaml()
> spits out a new YAML type, that basically just emits the source. The code
> outside of the class adds the YAML support to read this type back in,
> whenever
> it is encountered. Here's what the class looks like when it's resting in
> a YAML
> file:
>
> !rubyquiz.com,2005/SProc
> proc_src: |2-
> |*args|
> puts "Hello world"
> print "Args: "
> p args
>
> The advantage here is that the YAML export procedure never touches the
> Proc so
> it doesn't need to be hidden or removed and rebuilt.
>
> Florian's solution is also worth mention, though it takes a completely
> different
> road to solving the problem. Time and space don't allow me to recreate
> and
> annotate the code here, but Florian described the premise well in the
> submission
> message:
>
> I wrote this a while ago and it works by extracting a proc's origin file
> name and line number from its .inspect string and using the source code
> (which usually does not have to be read from disc) -- it works with
> procs generated in IRB, eval() calls and regular files. It does not work
> from ruby -e and stuff like "foo".instance_eval "lambda {}".source
> probably doesn't work either.
>
> Usage:
>
> code = lambda { puts "Hello World" }
> puts code.source
> Marshal.load(Marshal.dump(code)).call
> YAML.load(code.to_yaml).call
>
> The code itself is a fascinating read. It uses the relatively unknown
> SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture
> that
> source, and even implements a partial Ruby parser with standard libraries.
> I'm
> telling you, that code reads like a good mystery novel for programmers.
> Don't
> miss it!
>
> One last point. I said in the quiz all this is just a hack, no matter how
> useful it is. Dave Burt sent a message to Ruby talk along these lines:
>
> Proc's documentation tells us that "Proc objects are blocks of code that
> have been bound to a set of local variables." (That is, they are
> "closures"
> with "bindings".) Do any of the proposed solutions so far store local
> variables?
>
> # That is, can the following Proc be serialized?
> local_var = 42
> code = proc { local_var += 1 } # <= what should that look like in YAML?
> code.call #=> 43
>
> An excellent point. These toys we're creating have serious limitations to
> be
> sure. I assume this is the very reason Ruby's Procs cannot be serialized.
> Using binding() might make it possible to work around this problem in some
> instances, but there are clearly some Procs that cannot be cleanly
> serialized.
>
> My thanks to all who committed such wonderful code and discussion to this
> week's
> quiz. I know I learned multiple new things and I hope others did too.
>
> Tomorrow we have a quiz to sample some algorithmic fun...
>

Florian Groß

7/14/2005 4:58:00 PM

Christian Neukirchen

7/14/2005 5:54:00 PM

why the lucky stiff <ruby-talk@whytheluckystiff.net> writes:

> Christian Neukirchen wrote:
>
>> And then #yaml_dump and #yaml_load? That would rule.
>
> Class.yaml_new or Object.yaml_initialize. And Object.to_yaml.
>
> If folks prefer the Marshal setup, though, I'll change it. It's only
> been like this for a handful of minor releases.

Very good too, I'm looking forward to that.

Does this get into 1.8.3 (if that version will ever appear)?

> _why
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneuk...

comp.lang.ruby

[SUMMARY] SerializableProc (#38

James Gray

why the lucky stiff

Christian Neukirchen

why the lucky stiff

Jeffrey Moss

Florian Groß

Christian Neukirchen

x Login to ForumsZone