[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Text file parsing in ruby

Paul Van Delst

1/24/2007 3:33:00 PM

Hello,

As I use ruby more and more for things, I find myself creating "Config" classes, filling
them with data read from a simple text file, and then passing instances of config around
to do all the work. What I would like to get some advice on, or links to, is ruby-ish
methods of reading/parsing text files.

A lot of text files have, for example, some sort of header that says how much data is
coming, followed by the data itself, e.g.

Number of data points: 5
1 2
3 4
5 6
7 8
9 0
Number of data points: 2
10 20
11 21
Number of data points: 20
1 2
2 3
...etc..

Or, svn log output where the header line says how many lines of log message follow.

I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
example, I iterate over the lines,

IO.readlines(file_name).each do |line|
...parse the line
end

How do I take advantage of the fact that the "header" line tells me how much actual data
follows before the next header? I.e. I discover that I need to read 5 point so I read 5
points and the next line that is parsed in the above iteration is the next header line.
Sort of short-circuiting the iteration.

The solution I've come up with so far is to use "sentinel" values that flag what is to
come, but it's yuckily kludgy. Any tips from the 'sperts?

Apologies if this is a CS101 type of question.

cheers,

paulv

--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx
4 Answers

Ara.T.Howard

1/24/2007 3:57:00 PM

0

William James

1/24/2007 4:02:00 PM

0



On Jan 24, 9:33 am, Paul van Delst <Paul.vanDe...@noaa.gov> wrote:
> Hello,
>
> As I use ruby more and more for things, I find myself creating "Config" classes, filling
> them with data read from a simple text file, and then passing instances of config around
> to do all the work. What I would like to get some advice on, or links to, is ruby-ish
> methods of reading/parsing text files.
>
> A lot of text files have, for example, some sort of header that says how much data is
> coming, followed by the data itself, e.g.
>
> Number of data points: 5
> 1 2
> 3 4
> 5 6
> 7 8
> 9 0
> Number of data points: 2
> 10 20
> 11 21
> Number of data points: 20
> 1 2
> 2 3
> ..etc..
>
> Or, svn log output where the header line says how many lines of log message follow.
>
> I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
> example, I iterate over the lines,
>
> IO.readlines(file_name).each do |line|
> ...parse the line
> end
>
> How do I take advantage of the fact that the "header" line tells me how much actual data
> follows before the next header? I.e. I discover that I need to read 5 point so I read 5
> points and the next line that is parsed in the above iteration is the next header line.
> Sort of short-circuiting the iteration.
>
> The solution I've come up with so far is to use "sentinel" values that flag what is to
> come, but it's yuckily kludgy. Any tips from the 'sperts?
>
> Apologies if this is a CS101 type of question.
>
> cheers,
>
> paulv
>
> --
> Paul van Delst Ride lots.
> CIMSS @ NOAA/NCEP/EMC Eddy Merckx


open('data1'){|handle|
while header = handle.gets do
header[ /\d+/ ].to_i.times {
p handle.gets
}
end
}

Robert Klemme

1/24/2007 4:21:00 PM

0

On 24.01.2007 17:02, William James wrote:
>
> On Jan 24, 9:33 am, Paul van Delst <Paul.vanDe...@noaa.gov> wrote:
>> Hello,
>>
>> As I use ruby more and more for things, I find myself creating "Config" classes, filling
>> them with data read from a simple text file, and then passing instances of config around
>> to do all the work. What I would like to get some advice on, or links to, is ruby-ish
>> methods of reading/parsing text files.
>>
>> A lot of text files have, for example, some sort of header that says how much data is
>> coming, followed by the data itself, e.g.
>>
>> Number of data points: 5
>> 1 2
>> 3 4
>> 5 6
>> 7 8
>> 9 0
>> Number of data points: 2
>> 10 20
>> 11 21
>> Number of data points: 20
>> 1 2
>> 2 3
>> ..etc..
>>
>> Or, svn log output where the header line says how many lines of log message follow.
>>
>> I find I'm struggling to figure out a tidy way to read these sorts of files. If, for
>> example, I iterate over the lines,
>>
>> IO.readlines(file_name).each do |line|
>> ...parse the line
>> end
>>
>> How do I take advantage of the fact that the "header" line tells me how much actual data
>> follows before the next header? I.e. I discover that I need to read 5 point so I read 5
>> points and the next line that is parsed in the above iteration is the next header line.
>> Sort of short-circuiting the iteration.
>>
>> The solution I've come up with so far is to use "sentinel" values that flag what is to
>> come, but it's yuckily kludgy. Any tips from the 'sperts?
>>
>> Apologies if this is a CS101 type of question.
>>
>> cheers,
>>
>> paulv
>>
>> --
>> Paul van Delst Ride lots.
>> CIMSS @ NOAA/NCEP/EMC Eddy Merckx
>
>
> open('data1'){|handle|
> while header = handle.gets do
> header[ /\d+/ ].to_i.times {
> p handle.gets
> }
> end
> }

Or test after the fact:

# untested
sets = []
current = nil
items = nil

File.foreach('data1') do |line|
case line
when /Number of data points: (\d+)/
raise "Wrong amount" if current && current.size != items
items = $1.to_i
current = []
else
current << line.scan(/\d+/).map! {|x| x.to_i}
end
end

raise "Wrong amount" if current && current.size != items

Regards

robert

Paul Van Delst

1/24/2007 6:07:00 PM

0

Robert Klemme wrote:
> On 24.01.2007 17:02, William James wrote:
>>
>> On Jan 24, 9:33 am, Paul van Delst <Paul.vanDe...@noaa.gov> wrote:

[snip example]

>>>
>>> I find I'm struggling to figure out a tidy way to read these sorts of
>>> files. If, for
>>> example, I iterate over the lines,
>>>
>>> IO.readlines(file_name).each do |line|
>>> ...parse the line
>>> end
>>>
>>> How do I take advantage of the fact that the "header" line tells me
>>> how much actual data
>>> follows before the next header? I.e. I discover that I need to read 5
>>> point so I read 5
>>> points and the next line that is parsed in the above iteration is the
>>> next header line.
>>> Sort of short-circuiting the iteration.
>>>
>>> The solution I've come up with so far is to use "sentinel" values
>>> that flag what is to
>>> come, but it's yuckily kludgy. Any tips from the 'sperts?
>>>
>>
>>
>> open('data1'){|handle|
>> while header = handle.gets do
>> header[ /\d+/ ].to_i.times {
>> p handle.gets
>> }
>> end
>> }
>
> Or test after the fact:
>
> # untested
> sets = []
> current = nil
> items = nil
>
> File.foreach('data1') do |line|
> case line
> when /Number of data points: (\d+)/
> raise "Wrong amount" if current && current.size != items
> items = $1.to_i
> current = []
> else
> current << line.scan(/\d+/).map! {|x| x.to_i}
> end
> end
>
> raise "Wrong amount" if current && current.size != items

To all responders, as always, thanks very much. You guys are great. One day I will grok
this much better (but I have some unlearning to do...)

cheers,

paulv

p.s. Ara, I do use YAML for some things, but I don't always (actually, quite rarely) have
control of how the file is created. :o(

--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx