Robert Klemme
1/8/2008 2:22:00 PM
2008/1/8, Carlos J. Hernandez <carlosjhr64@fastmail.fm>:
> Robert:
> Thanks for your performance improvement suggestion.
> I did not think of giving Marshal $stdout.
> But the problem remains that I don't know ahead of time how many bytes
No, this is not a problem because Marshal.load will take care of this
(as you can see from the command line example I posted).
> the Marshal data will have and
> I can no longer use "\n", the input line separator, as a record
> separator.
Not needed as said before.
> As for general usefulness.
> If you already have a general purpose cat, filter, transform, and sort
> programs...
> And just want to see the results of manipulating the contents of some
> source file....
> Then just say
> cat source.txt | transform | filter | sort > result.txt
... and get another "useless cat award". :-)
> I do these kind of stuff all the time, I just have not program that way
> before.
> I just started because the model is useful in my data downloads where
> I download history CSVs from Finance.Yahoo.com and along the way to
> append to my data files,
> I transform the data.
> There is an impedance problem though,
> in having to flatten and convert a data structure that contain floats,
> integers, and dates,
> back to a CSV line every time you go through the pipe, and then restore
> it back in the receiver.
> Marshal solves this, except that "\n" can no longer be used as record
> separators.
Marshal basically just hides the conversion and makes it faster. The
conversion is still there: you have a data structure (say an array),
transform it into a sequence of bytes (either CSV or Marshal format),
send it through a pipe, transform byte sequence back (either from CSV
or Marshal format) and get out the array again. That's why I say it's
more efficient to not use two processes but do it in one Ruby process
most of the time (i.e. on single core machine or with IO bound stuff).
> Marshal is more efficient, that's why someone wrote it.
Not only that. Marshal servers a slightly different purpose, namely
converting object graphs which can contain loops into a byte stream
and resurrecting this graph from the byte stream.
> Lastly, computer will be multi-processing from here on...
> Faster chips are finding their physical limits.
But OTOH Ruby will rather sooner than later use native threads and a
multithreaded application is easier and in this particular case also
more efficient (unless you use tons of memory per processing step)
because you do not need the conversion for IPC. Do you actually
/need/ that processing power?
> BTW, I have an implementation of Marshal Pipes, just as I described in
> my opening email.
> It works great.
That's nice for you. But you proposed a general solution in your
original posting. At least that's what I picked up from your last
statements. With this (public!) discussion we are trying to find out
whether it *is* actually a good idea for the general audience. So far
I haven't been convinced that it is indeed.
Kind regards
robert
--
use.inject do |as, often| as.you_can - without end