Jeffrey Moss
7/14/2005 4:34:00 PM
Has anybody thought about serialized enclosures? I was thinking of a way to
use enclosures across multiple apache requests, and came to the conclusion
that it was too much trouble. In this case I just use a standard proc object
and it gets re-initialized on each requests and don't serialize it, but I
always thought it would be nice to maintain some sort of persistent state
across requests.
Wouldn't it be possible to write a C extension for serializable closures?
-Jeff
----- Original Message -----
From: "Ruby Quiz" <james@grayproductions.net>
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Thursday, July 14, 2005 6:51 AM
Subject: [SUMMARY] SerializableProc (#38)
> The solutions this time show some interesting differences in approach, so
> I want
> to walk through a handful of them below. The very first solution was from
> Robin
> Stocker and that's a fine place to start. Here's the class:
>
> class SerializableProc
>
> def initialize( block )
> @block = block
> # Test if block is valid.
> to_proc
> end
>
> def to_proc
> # Raises exception if block isn't valid, e.g. SyntaxError.
> eval "Proc.new{ #{@block} }"
> end
>
> def method_missing( *args )
> to_proc.send( *args )
> end
>
> end
>
> It can't get much simpler than that. The main idea here, and in all the
> solutions, is that we need to capture the source of the Proc. The source
> is
> just a String so we can serialize that with ease and we can always create
> a new
> Proc if we have the source. In other words, Robin's main idea is to go
> (syntactically) from this:
>
> Proc.new {
> puts "Hello world!"
> }
>
> To this:
>
> SerializableProc.new %q{
> puts "Hello world!"
> }
>
> In the first pure Ruby version we're building a Proc with the block of
> code to
> define the body. In the second SerializableProc version, we're just
> passing a
> String to the constructor that can be used to build a block. Christian
> Neukirchen had something very interesting to say about the change:
>
> Obvious problems of this approach are the lack of closures and editor
> support (depending on the inverse quality of your editor :P)...
>
> We'll get back to the lack of closures issue later, but I found the
> "inverse
> quality of your editor" claim interesting. The meaning is that a poor
> editor
> may not consider %q{...} equivalent to '...'. If it doesn't realize a
> String is
> being entered, it may continue to syntax highlight the code inside. Of
> course,
> you could always remove the %q whenever you want to see the code
> highlighting,
> but that's tedious.
>
> Getting back to Robin's class, initialize() just stores the String and
> creates a
> Proc from it so an Exception will be thrown at construction time if fed
> invalid
> code. The method to_proc() is what builds the Proc object by wrapping the
> String in "Proc.new { ... }" and calling eval(). Finally, method missing
> makes
> SerializableProc behave close to a Proc. Anytime it sees a method call
> that
> isn't initialize() or to_proc(), it creates a Proc object and forwards the
> message.
>
> We don't see anything specific to Serialization in Robin's code, because
> both
> Marshal (PStore uses Marshal) and YAML can handle a custom class with
> String
> instance data. Like magic, it all just works.
>
> Robin had a complaint though:
>
> I imagine my solution is not very fast, as each time a method on the
> SerializableProc is called, a new Proc object is created.
>
> The object could be saved in an instance variable @proc so that speed is
> only low on the first execution. But that would require the definition of
> custom dump methods for each Dumper so that it would not attempt to dump
> @proc.
>
> My own solution (and others), do cache the Proc and define some custom
> dump
> methods. Let's have a look at how something like that comes out:
>
> class SerializableProc
> def self._load( proc_string )
> new(proc_string)
> end
>
> def initialize( proc_string )
> @code = proc_string
> @proc = nil
> end
>
> def _dump( depth )
> @code
> end
>
> def method_missing( method, *args )
> if to_proc.respond_to? method
> @proc.send(method, *args)
> else
> super
> end
> end
>
> def to_proc( )
> return @proc unless @proc.nil?
>
> if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
> @proc = eval @code
> elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
> @proc = eval "lambda #{@code}"
> else
> @proc = eval "lambda { #{@code} }"
> end
> end
>
> def to_yaml( )
> @proc = nil
> super
> end
> end
>
> My initialize() is the same, save that I create a variable to hold the
> Proc
> object and I wasn't clever enough to trigger the early Exception when the
> code
> is bad. My to_proc() looks scary but I just try to accept a wider range
> of
> Strings, wrapping them in only what they need. The end result is the
> same.
> Note that any Proc created is cached. My method_missing() is also very
> similar.
> If the Proc object responds to the method, it is forwarded. The first
> line of
> method_missing() calls to_proc() to ensure we've created one. After that,
> it
> can safely use the @proc variable.
>
> The _load() class method and _dump() instance method is what it takes to
> support
> Marshal. First, _dump() is expected to return a String that could be used
> to
> rebuild the instance. Then, _load() is passed that String on reload and
> expected to return the recreated instance. The String choice is simple in
> this
> case, since we're using the source.
>
> There are multiple ways to support YAML serialization, but I opted for the
> super
> simple cheat. YAML can't serialize a Proc, but it's just a cache that can
> always be restored. I just override to_yaml() and clear the cache before
> handing serialization back to the default method. My code is unaffected
> by the
> Proc's absence and it will recreate it when needed.
>
> Taking one more step, Dominik Bathon builds the Proc in the constructor
> and
> never has to recreate it:
>
> require "delegate"
> require "yaml"
>
> class SProc < DelegateClass(Proc)
>
> attr_reader :proc_src
>
> def initialize(proc_src)
> super(eval("Proc.new { #{proc_src} }"))
> @proc_src = proc_src
> end
>
> def ==(other)
> @proc_src == other.proc_src rescue false
> end
>
> def inspect
> "#<SProc: #{@proc_src.inspect}>"
> end
> alias :to_s :inspect
>
> def marshal_dump
> @proc_src
> end
>
> def marshal_load(proc_src)
> initialize(proc_src)
> end
>
> def to_yaml(opts = {})
> YAML::quick_emit(self.object_id, opts) { |out|
> out.map("!rubyquiz.com,2005/SProc" ) { |map|
> map.add("proc_src", @proc_src)
> }
> }
> end
>
> end
>
> YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
> SProc.new(val["proc_src"])
> }
>
> Dominik uses the delegate library, instead of the method_missing() trick.
> That's a two step process. You can see the first step when SPoc is
> defined to
> inherit from DelegateClass(Proc), which sets a type for the object so
> delegate
> knows which messages to forward. The second step is the first line of the
> constructor, which passes the delegate object to the DelegateClass.
> That's the
> instance that will receive forwarded messages. Dominik also defined a
> custom
> ==(), "because that doesn't really work with method_missing/delegate."
>
> Dominik's code uses a different interface to support Marshal, but does the
> same
> thing I did, as you can see. The YAML support is different.
> SProc.to_yaml()
> spits out a new YAML type, that basically just emits the source. The code
> outside of the class adds the YAML support to read this type back in,
> whenever
> it is encountered. Here's what the class looks like when it's resting in
> a YAML
> file:
>
> !rubyquiz.com,2005/SProc
> proc_src: |2-
> |*args|
> puts "Hello world"
> print "Args: "
> p args
>
> The advantage here is that the YAML export procedure never touches the
> Proc so
> it doesn't need to be hidden or removed and rebuilt.
>
> Florian's solution is also worth mention, though it takes a completely
> different
> road to solving the problem. Time and space don't allow me to recreate
> and
> annotate the code here, but Florian described the premise well in the
> submission
> message:
>
> I wrote this a while ago and it works by extracting a proc's origin file
> name and line number from its .inspect string and using the source code
> (which usually does not have to be read from disc) -- it works with
> procs generated in IRB, eval() calls and regular files. It does not work
> from ruby -e and stuff like "foo".instance_eval "lambda {}".source
> probably doesn't work either.
>
> Usage:
>
> code = lambda { puts "Hello World" }
> puts code.source
> Marshal.load(Marshal.dump(code)).call
> YAML.load(code.to_yaml).call
>
> The code itself is a fascinating read. It uses the relatively unknown
> SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture
> that
> source, and even implements a partial Ruby parser with standard libraries.
> I'm
> telling you, that code reads like a good mystery novel for programmers.
> Don't
> miss it!
>
> One last point. I said in the quiz all this is just a hack, no matter how
> useful it is. Dave Burt sent a message to Ruby talk along these lines:
>
> Proc's documentation tells us that "Proc objects are blocks of code that
> have been bound to a set of local variables." (That is, they are
> "closures"
> with "bindings".) Do any of the proposed solutions so far store local
> variables?
>
> # That is, can the following Proc be serialized?
> local_var = 42
> code = proc { local_var += 1 } # <= what should that look like in YAML?
> code.call #=> 43
>
> An excellent point. These toys we're creating have serious limitations to
> be
> sure. I assume this is the very reason Ruby's Procs cannot be serialized.
> Using binding() might make it possible to work around this problem in some
> instances, but there are clearly some Procs that cannot be cleanly
> serialized.
>
> My thanks to all who committed such wonderful code and discussion to this
> week's
> quiz. I know I learned multiple new things and I hope others did too.
>
> Tomorrow we have a quiz to sample some algorithmic fun...
>