Bill Kelly
5/17/2007 6:56:00 PM
From: "Tim Pease" <tim.pease@gmail.com>
>
> You could do a mmap solution. Modify Hash such that []= does a
> Marshal.dump of your object, stores the object into the mmap, and then
> that memory location is stored in the Hash instead of the object.
>
> [] must also be modified to take the memory location, Marshal.load the
> object from the mmap, and then return the object.
>
> The hard part of this is doing the memory management of the mmap --
> when an object is deleted from the hash, removing it from the mmap;
> consolidating unused mmap regions; etc. All the standard MMU stuff
> you normally don't have to deal with in Ruby.
>
> It would be much easier to implement if all the objects being stored
> in the Hash were guaranteed to be the same size. Then you would just
> need an free/allocated array to keep track of what can go where in the
> mmap.
Agreed. But ironically, what gets me, is that with a modern
VMM, this is exactly what is already going on with Ruby's hash
in memory. Except that the backing store is the system swap
file, and so, not persistent.
In principle, I just want to change the backing store to a
memory mapped file, instead. :-)
I've wondered what would happen if one took a nice malloc
implementation, made it operate inside a heap that was
memory-mapped onto a file, and then took something like the
STL hash_map (or ruby's hash) and wired it to the malloc
allocating from the memory-mapped file.
Intuitively, it seems it would have no choice but to perform
fantastically, as long as the whole file could be mapped
into memory.
However, once the file size exceeded available memory, I
can imagine that it might degrade to sub-optimal performace.
Along these lines, I've also wondered if one could get a
ruby application to persist similarly, (in principle!)
by wiring ruby's memory allocation functions to a malloc
that was allocating from a memory-mapped file. Of course
the tricky part would be dealing with all the objects
containing system resources that can't be persisted,
such as file handles, etc. etc. Probably a nightmare in
practical terms, unless the language and its libraries
were designed that way from the start...
Ah, well. In talking about this, it seems there are really
two scenarios for memory-mapped persistent hashes. One when
all the pages can fit in RAM; and the other when the filesize
would greately exceed available RAM (and even worse, when
the filesize exceeds the maximum contiguous block that
can be even mapped into the process address space at all.)
Hmm...
Regards,
Bill