Robert Klemme
1/4/2009 7:29:00 PM
On 31.12.2008 17:28, cchayden.nyt@gmail.com wrote:
> When I run the program:
>
> STDIN.each_line { |line| line.split("\t") }
Is it really only this line? How do you feed the big file to stdin?
Does the big file actually _have_ lines? Because if not you would
likely see that behavior because each_line needs to read at least until
it finds a line terminator.
> with a big input file, the process size grows rapidly, reaching 1G in
> less than 1 minute.
> If I let it go, it continues to grow at this rate until the whole
> system fails.
What does this mean? Does the kernel panic? Or does the Ruby process
terminate with an error?
> This is ruby 1.8.6 on fedora 9.
You do not accidentally have switched off GC completely, do you?
> Is this a known problem?
> Is there some way to work around it?
1.8.6 is not really current any more. I would upgrade if possible - at
least get the latest version of the package. For more advice we have a
bit too little information, I am afraid.
When I try this with my cygwin version memory consumption stays at
roughly 3MB:
robert@fussel ~
$ perl -e 'foreach $i (1..10000000) {print $i, "--\t--\t--\t--\n";}' > | ruby -e 'STDIN.each_line { |line| line.split("\t") }'
robert@fussel ~
$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
robert@fussel ~
$
But if I read a file that does not have lines behavior is as you
describe, memory goes up and up
ruby -e 'STDIN.each_line { |line| line.split("\t") }' </dev/zero
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end