Asp Forum - querying persistent ruby objects in memory

braver

5/26/2007 9:01:00 PM

I have a data-mining task which loads data as a big XML tree (10+ MB)
and then reorganizes it. Even loading it with Hpricot takes 10-20
seconds. I don't want to do it for every manilupation I want to try,
especially for sequences of transformations.

Thus I wonder what's a good way to keep the huge object in memory
between the runs of querying scripts. Can Rails be used for that?
I'd rather avoid writing a client-server platform, or using it per se,
unless there's already an existing one. A vague intuition is, it
should be something like threads -- one thread parses XML and keeps it
in memory, another starts up later, somehow joins the memory space of
the first one, queries/transforms it, and ends. Then other queries/
transformations can all be run. Is there anything like it?

Cheers,
Alexy

4 Answers

Robert Klemme

5/26/2007 9:25:00 PM

On 26.05.2007 23:00, braver wrote:
> I have a data-mining task which loads data as a big XML tree (10+ MB)
> and then reorganizes it. Even loading it with Hpricot takes 10-20
> seconds. I don't want to do it for every manilupation I want to try,
> especially for sequences of transformations.
>
> Thus I wonder what's a good way to keep the huge object in memory
> between the runs of querying scripts. Can Rails be used for that?
> I'd rather avoid writing a client-server platform, or using it per se,
> unless there's already an existing one. A vague intuition is, it
> should be something like threads -- one thread parses XML and keeps it
> in memory, another starts up later, somehow joins the memory space of
> the first one, queries/transforms it, and ends. Then other queries/
> transformations can all be run. Is there anything like it?

I'd consider using Marshal.

Kind regards

robert

braver

5/26/2007 10:34:00 PM

On May 26, 2:24 pm, Robert Klemme <shortcut...@googlemail.com> wrote:
> I'd consider using Marshal.

That's just plain serialization, isn't it? I've seen that and
Madelaine; but my wish is to keep the objects in memory without the
need to dump/reload it, however fast. (That would be a last resort.)

The question is, can we keep an object in memory in one thread, and
explore/change it from another? In the worst case, we can probably
quickly dump an object into a memory region and reload it back via
Marshal -- I guess a crude solution is forming here, using shared
memory or RAM disk -- have to see what's there for macs... But still
I wonder what folks think in terms of all kinds of RAM persistence in
ruby solutions.

Cheers,
Alexy

James Tucker

5/27/2007 6:12:00 AM

Someone else was talking about this kind of problem the other day in
#ruby-lang.

Another posted an elegant solution to the problem (which incidentally
was refused as it was another process), however:

#!ruby
raise 'You need to install win32/process' unless require 'win32/process'
if RUBY_PLATFORM.include? 'mswin32'
# parent forks off and dies, leaving child as daemon
exit 0 if !fork.nil?

# daemon code starts here
require 'drb/drb'
require 'thread'
require 'server'

$SAFE = 1 # disable eval() and friends

DRb.start_service("druby://:2020", Server.new)
puts DRb.uri
DRb.thread.join

Francis Cianfrocca wrote:
> On 5/26/07, braver <deliverable@gmail.com> wrote:
>>
>> On May 26, 2:24 pm, Robert Klemme <shortcut...@googlemail.com> wrote:
>> > I'd consider using Marshal.
>>
>> That's just plain serialization, isn't it? I've seen that and
>> Madelaine; but my wish is to keep the objects in memory without the
>> need to dump/reload it, however fast. (That would be a last resort.)
>>
>> The question is, can we keep an object in memory in one thread, and
>> explore/change it from another? In the worst case, we can probably
>> quickly dump an object into a memory region and reload it back via
>> Marshal -- I guess a crude solution is forming here, using shared
>> memory or RAM disk -- have to see what's there for macs... But still
>> I wonder what folks think in terms of all kinds of RAM persistence in
>> ruby solutions.
>
>
>
>
> Aren't you overengineering a little? You want to amortize a ten-second
> startup cost over a (presumably) large number of operations against some
> dataset. But you keep talking about threads. That tells me that your
> process
> will run for a long time and will know all the operations it has to
> execute
> upfront. In that case, forget about threads and just serialize your
> operations. Your life will be much simpler.
>
> But on the other hand, you talk about shared memory and about not
> wanting to
> write a client/server application. That suggests that you're thinking of
> keeping this dataset around and having other PROCESSES sent requests
> to it
> at arbitrary times. In that case, don't use threads either, or
> shared-memory
> for that matter. Life is too short to debug all that stuff. Write
> yourself a
> little client-server application and be done with it. If you don't
> want to
> deal with the network programming, use EventMachine.
>

Robert Klemme

5/27/2007 8:38:00 AM

On 27.05.2007 00:33, braver wrote:
> On May 26, 2:24 pm, Robert Klemme <shortcut...@googlemail.com> wrote:
>> I'd consider using Marshal.
>
> That's just plain serialization, isn't it? I've seen that and
> Madelaine; but my wish is to keep the objects in memory without the
> need to dump/reload it, however fast. (That would be a last resort.)

I find that odd. Keeping something in memory is usually a *solution*
for some kind of *business requirement* (e.g. to make things fast). Why
would you want to keep something in mem if it can be persisted on disk
really fast? I don't know the volume of what you need to handle but did
you actually try out how fast it is?

> The question is, can we keep an object in memory in one thread, and
> explore/change it from another?

Yes, of course. Easily sharing memory is one (if not *the*) major
aspect of multithreaded applications. But reading your other posting I
am not sure whether you have the proper idea of MT programming. If you
only want to do one set of manipulations at a time you do not need
multiple threads because there is no concurrency involved.

> In the worst case, we can probably
> quickly dump an object into a memory region and reload it back via
> Marshal -- I guess a crude solution is forming here, using shared
> memory or RAM disk -- have to see what's there for macs... But still
> I wonder what folks think in terms of all kinds of RAM persistence in
> ruby solutions.

As James suggested using DRb is one option. Then you can decide whether
to manipulate the object graph in the server process or send it off to
the client (and probably send it back after doing your changes). It's
probably the best solution in your case because you can start arbitrary
client processes and manipulate state in the server. But you should
make sure that access is proper synchronized to cope with multiple
clients that connect concurrently.

Kind regards

robert

comp.lang.ruby

querying persistent ruby objects in memory

braver

Robert Klemme

braver

James Tucker

Robert Klemme

x Login to ForumsZone