Asp Forum - Parsing a file with look ahead

S. Robert James

2/22/2007 12:35:00 AM

I need to parse a file line by line, and output the results line by
line (too big to fit into memory). So far, simple enough:
file.each_line.

However, the parser needs the ability to peek ahead to the next line,
in order to parse this line. What's the right way to do this? Again,
I really don't want to try to slurp the whole file into memory and
split on newlines.

Here's an example:
Line1: Hi
Line2: How
Line3: Are
Line4: you?

I'd like to:
parse('Hi', 'How')
parse('How', 'Are')
parse('Are', 'you?')
parse('you?', false)
# hey, this is practically a unit test!

Any ideas?

6 Answers

Carl Lerche

2/22/2007 12:38:00 AM

The first thing I can think of is do file.each_line and store that
line in a previous_line variable at the end of the proc. Then you have
access to the line that was read before hand and the current one.

On 2/21/07, S. Robert James <srobertjames@gmail.com> wrote:
> I need to parse a file line by line, and output the results line by
> line (too big to fit into memory). So far, simple enough:
> file.each_line.
>
> However, the parser needs the ability to peek ahead to the next line,
> in order to parse this line. What's the right way to do this? Again,
> I really don't want to try to slurp the whole file into memory and
> split on newlines.
>
> Here's an example:
> Line1: Hi
> Line2: How
> Line3: Are
> Line4: you?
>
> I'd like to:
> parse('Hi', 'How')
> parse('How', 'Are')
> parse('Are', 'you?')
> parse('you?', false)
> # hey, this is practically a unit test!
>
> Any ideas?
>
>
>

--
EPA Rating: 3000 Lines of Code / Gallon (of coffee)

Florian Frank

2/22/2007 12:56:00 AM

S. Robert James wrote:
> I'd like to:
> parse('Hi', 'How')
> parse('How', 'Are')
> parse('Are', 'you?')
> parse('you?', false)
> # hey, this is practically a unit test!
>
> Any ideas?
>
>
>
>
require 'enumerator'

File.new(filename).enum_slice(2).each do |first, second|
p [ first, second ? second : false ]
end

--
Florian Frank

S. Robert James

2/22/2007 1:39:00 AM

Thanks! BTW, looking at the Rdoc, it seems each_cons is what I want,
no?

On Feb 21, 7:56 pm, "Florian Frank" <f...@nixe.ping.de> wrote:
> S. Robert James wrote:
> > I'd like to:
> > parse('Hi', 'How')
> > parse('How', 'Are')
> > parse('Are', 'you?')
> > parse('you?', false)
> > # hey, this is practically a unit test!
>
> > Any ideas?
>
> require 'enumerator'
>
> File.new(filename).enum_slice(2).each do |first, second|
> p [ first, second ? second : false ]
> end
>
> --
> Florian Frank

Gregory Brown

2/22/2007 2:00:00 AM

On 2/21/07, S. Robert James <srobertjames@gmail.com> wrote:
> Thanks! BTW, looking at the Rdoc, it seems each_cons is what I want,
> no?

If you are dealing with paired lines, use enum_slice(2)

if you are dealing with data dependent on the current and previous
line, use each_cons, yes.

Daniel DeLorme

3/1/2007 1:42:00 AM

Gregory Brown wrote:
> On 2/21/07, S. Robert James <srobertjames@gmail.com> wrote:
>> Thanks! BTW, looking at the Rdoc, it seems each_cons is what I want,
>> no?
>
> If you are dealing with paired lines, use enum_slice(2)
>
> if you are dealing with data dependent on the current and previous
> line, use each_cons, yes.

Except each_cons(n) will iterate 9 times if you have 10 lines.

Maybe something simple like this?

line = f.gets
while line
nextline = f.gets
#do stuff...
line = nextline
end

Daniel

Thomas Hafner

3/4/2007 2:31:00 AM

"S. Robert James" <srobertjames@gmail.com> wrote/schrieb <1172104495.175931.201910@p10g2000cwp.googlegroups.com>:

> I need to parse a file line by line, and output the results line by
> line (too big to fit into memory). So far, simple enough:
> file.each_line.
>
> However, the parser needs the ability to peek ahead to the next line,
> in order to parse this line. What's the right way to do this? Again,
> I really don't want to try to slurp the whole file into memory and
> split on newlines.

Sounds for me like it could be solved elegantly with a lazy stream of
input lines. For lazy streams see the Usenet thread starting with
article <9oib84-sf6.ln1@faun.hafner.nl.eu.org>, for instance.

The file will be split into lines, but lazily, and for that reason all
the lines don't need to be hold in memory at the same time. Old, i.e.
already consumed lines will be garbage collected soon, because the
application does no longer reference them. You can have as many
lookahead lines as you want (tradeoff: needs more memory, of course).

Regards
Thomas

comp.lang.ruby

Parsing a file with look ahead

S. Robert James

Carl Lerche

Florian Frank

S. Robert James

Gregory Brown

Daniel DeLorme

Thomas Hafner

x Login to ForumsZone