[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

parsing text into usablle numerical data

Cthulhu __

6/17/2008 5:45:00 PM

Hey total ruby n00b here...
I'm having trouble with parsing data into ruby for statistical analysis.

The data looks like this:
32 0 0 0 0 0 0 0 0 8412803500 0 0 0 0 0 0 0 46655166 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 240554000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 85321000 0 0
0 0 0 0 0 479719000 0 0 0 97823285 283432000 0 73887750 0 0 157225000
88659750 285211000 70285000 0 161747000 161167000 234739666 120400000
300083000 0 0 202327250 111865000 183127000 0 161027000 0 0 0

33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I need to store the index of the entry, in this case 32 and 33, as the
name of the 2 Dimensional array (array of arrays as ruby handles it?)
and then each value (alot of zeros in the case of 33) as a unique entry.
The format is plain tyext right now and I have had no luck using
File.readline

For some reason this does NOT work, the array dimensions do not match
expected structure:
elsif /aqua_t/=~(files_to_parse[i])
line_counter = 0
line = File.readlines(files_to_parse[i]).each do |line|
line.each{|x| x.to_i}
raw_data_t[line_counter]=line.split
aqua = [0]
line_counter+=1
end#ends block over lines


Any advice would be much appreciated.
--m
--
Posted via http://www.ruby-....

12 Answers

Jesús Gabriel y Galán

6/17/2008 9:34:00 PM

0

On Tue, Jun 17, 2008 at 7:44 PM, Cthulhu __ <weedmasterp@gmail.com> wrote:
> Hey total ruby n00b here...
> I'm having trouble with parsing data into ruby for statistical analysis.
>
> The data looks like this:
> 32 0 0 0 0 0 0 0 0 8412803500 0 0 0 0 0 0 0 46655166 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 240554000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 85321000 0 0
> 0 0 0 0 0 479719000 0 0 0 97823285 283432000 0 73887750 0 0 157225000
> 88659750 285211000 70285000 0 161747000 161167000 234739666 120400000
> 300083000 0 0 202327250 111865000 183127000 0 161027000 0 0 0
>
> 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> I need to store the index of the entry, in this case 32 and 33, as the
> name of the 2 Dimensional array (array of arrays as ruby handles it?)
> and then each value (alot of zeros in the case of 33) as a unique entry.

If you need a "name" for the entry then you might be thinking of a hash, in
which 32 and 33 would be the keys and the rest of the line the value.
What do you mean by "a unique entry"? Maybe the rest of the string, as a string?
An array with an element for each number in the string?

> The format is plain tyext right now and I have had no luck using
> File.readline
>
> For some reason this does NOT work, the array dimensions do not match
> expected structure:
> elsif /aqua_t/=~(files_to_parse[i])
> line_counter = 0
> line = File.readlines(files_to_parse[i]).each do |line|

The each method returns the full enumerable, so after the
iteration, the line variable will reference an array of all lines.
Not sure that's what you want.

> line.each{|x| x.to_i}

This line does nothing, you don't assign the return values of the block
anywhere or modify anything inside the block. But this line makes
me think you want an array of numbers.

> raw_data_t[line_counter]=line.split
> aqua = [0]
> line_counter+=1
> end#ends block over lines

This works for me:

irb(main):001:0> line_counter = 0
irb(main):005:0> raw = []
irb(main):006:0> File.readlines("data.txt").each do |line|
irb(main):007:1* raw[line_counter] = line.split
irb(main):008:1> line_counter += 1
irb(main):009:1> end

irb(main):010:0> raw
=> [["32", "0", "0", "0", "0", "0", "0", "0", "0", "8412803500", "0",
"0", "0", "0", "0", "0", "0", "46655166", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "240554000", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "85321000", "0", "0", "0", "0", "0", "0",
"0", "479719000", "0", "0", "0", "97823285", "283432000", "0",
"73887750", "0", "0", "157225000", "88659750", "285211000",
"70285000", "0", "161747000", "161167000", "234739666", "120400000",
"300083000", "0", "0", "202327250", "111865000", "183127000", "0",
"161027000", "0", "0", "0"], ["33", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"]]

So I'm not sure what the exact problem you are having is. The above
could be simplified, though, to avoid the counter:

raw = []
File.readlines("data.txt").each do |line|
raw << line.split
end


If you need the hash with keys 32 and 33, and as value an array of
numbers, you can do this:

raw = {}
File.readlines("data.txt").each do |line|
key, *value = line.split
value.map! {|x| x.to_i}
raw[key] = value
end

If the key needs to be a number you can to_i it too.

Hope this helps,

Jesus.

Robert Klemme

6/18/2008 6:30:00 AM

0

On 17.06.2008 23:34, Jesús Gabriel y Galán wrote:
> On Tue, Jun 17, 2008 at 7:44 PM, Cthulhu __ <weedmasterp@gmail.com> wrote:
>> Hey total ruby n00b here...
>> I'm having trouble with parsing data into ruby for statistical analysis.
>>
>> The data looks like this:
>> 32 0 0 0 0 0 0 0 0 8412803500 0 0 0 0 0 0 0 46655166 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 240554000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 85321000 0 0
>> 0 0 0 0 0 479719000 0 0 0 97823285 283432000 0 73887750 0 0 157225000
>> 88659750 285211000 70285000 0 161747000 161167000 234739666 120400000
>> 300083000 0 0 202327250 111865000 183127000 0 161027000 0 0 0
>>
>> 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>> I need to store the index of the entry, in this case 32 and 33, as the
>> name of the 2 Dimensional array (array of arrays as ruby handles it?)
>> and then each value (alot of zeros in the case of 33) as a unique entry.
>
> If you need a "name" for the entry then you might be thinking of a hash, in
> which 32 and 33 would be the keys and the rest of the line the value.
> What do you mean by "a unique entry"? Maybe the rest of the string, as a string?
> An array with an element for each number in the string?

I believe he wants to use the first number as index and the rest of the
line as array of integers thus yielding a two dimensional array.

This is probably what I'd do:

data = []
File.foreach "data.txt" do |line|
idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
data[idx] = dat
end

Kind regards

robert

Cthulhu __

6/18/2008 4:11:00 PM

0

that is exactly what I was looking for. Thank you for the very elegant
solution.
--m
--
Posted via http://www.ruby-....

Cthulhu __

6/18/2008 6:11:00 PM

0


Still having some problems with this program. I'm getting a TypeError
when the value of

dx, *dat = line.scan(/\d+/).map.compact
is nil

Any suggestions on how to handle this? my current implementation looks
like this:

raw_data = Array.new
File.foreach files_to_parse[i].to_s do |line|
idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
raw_data[idx] = dat
end

but if I include a conditional statement that excludes nil inside this
block
{|s| s.to_i}

then the size of the array changes, which i need to avoid.

--m
--
Posted via http://www.ruby-....

Cthulhu __

6/18/2008 6:18:00 PM

0

oh... and also, I'm not sure exactly how idx is being incremented?
--m
--
Posted via http://www.ruby-....

Chris Hulan

6/18/2008 7:03:00 PM

0

On Jun 18, 2:10 pm, Cthulhu __ <weedmast...@gmail.com> wrote:
> Still having some problems with this program. I'm getting a TypeError
> when the value of
>
> dx, *dat = line.scan(/\d+/).map.compact
> is nil
>
> Any suggestions on how to handle this? my current implementation looks
> like this:
>
> raw_data = Array.new
> File.foreach files_to_parse[i].to_s do |line|
> idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
> raw_data[idx] = dat
> end
....

looks like the blank line between records is the culprit
could just skip it explicitly:

> File.foreach files_to_parse[i].to_s do |line|
next if line.strip == '' #skip line if it is empty
> idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
> raw_data[idx] = dat
> end

In response to your question re where idx comes from,
the code in that line is creating an array of ints from the current
line,
then the parallel assignemnt is puting the first number in the array
into idx, and putting the rest
of the array into dat

hth
Chris
cheers

David A. Black

6/18/2008 7:10:00 PM

0

Hi --

On Thu, 19 Jun 2008, Cthulhu __ wrote:

>
> Still having some problems with this program. I'm getting a TypeError
> when the value of
>
> dx, *dat = line.scan(/\d+/).map.compact
> is nil
>
> Any suggestions on how to handle this? my current implementation looks
> like this:
>
> raw_data = Array.new
> File.foreach files_to_parse[i].to_s do |line|
> idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
> raw_data[idx] = dat
> end
>
> but if I include a conditional statement that excludes nil inside this
> block
> {|s| s.to_i}
>
> then the size of the array changes, which i need to avoid.

There won't be any nils in the array resulting from line.scan(/\d+/).
It will either be empty or contain strings. So you shouldn't need to
compact it. That line is a bit tangled in general. What happens if you
run Robert's code as posted?


David

--
Rails training from David A. Black and Ruby Power and Light:
ADVANCING WITH RAILS June 16-19 Berlin
ADVANCING WITH RAILS July 21-24 Edison, NJ
See http://www.r... for details and updates!

David A. Black

6/18/2008 7:13:00 PM

0

Hi --

On Thu, 19 Jun 2008, Chris Hulan wrote:

> On Jun 18, 2:10 pm, Cthulhu __ <weedmast...@gmail.com> wrote:
>> Still having some problems with this program. I'm getting a TypeError
>> when the value of
>>
>> dx, *dat = line.scan(/\d+/).map.compact
>> is nil
>>
>> Any suggestions on how to handle this? my current implementation looks
>> like this:
>>
>> raw_data = Array.new
>> File.foreach files_to_parse[i].to_s do |line|
>> idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
>> raw_data[idx] = dat
>> end
> ...
>
> looks like the blank line between records is the culprit
> could just skip it explicitly:
>
>> File.foreach files_to_parse[i].to_s do |line|
> next if line.strip == '' #skip line if it is empty
>> idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
>> raw_data[idx] = dat
>> end

That scan/map/compact! line is still wrong. The block should go with
map, not compact!, and compact! returns nil if its receiver doesn't
change:

[1,2,3].compact! # => nil

There's no reason that line.scan(/\d+/) would ever contain nil, so
compact! will always return nil and map, in that position, does
nothing in 1.8 and makes compact! blow up in 1.9 :-)


David

--
Rails training from David A. Black and Ruby Power and Light:
ADVANCING WITH RAILS June 16-19 Berlin
ADVANCING WITH RAILS July 21-24 Edison, NJ
See http://www.r... for details and updates!

Cthulhu __

6/18/2008 7:28:00 PM

0

Sot he code as taken from Chris' post returns this error:

read_data.rb:72:in `[]=': no implicit conversion from nil to integer
(TypeError)
from read_data.rb:72
from read_data.rb:69:in `foreach'
from read_data.rb:69

The code looks like this with line numbers...

69 File.foreach files_to_parse[i].to_s do |line|
70 next if line.strip == ''
71 idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
72 raw_data[idx] = dat
73 end
74 puts raw_data

I suspect it may be a problem in the formatting of the data... perhaps
there is some way to remove any non-integer characters eg. delimiting
chars before parsing?
--m
--
Posted via http://www.ruby-....

David A. Black

6/18/2008 7:59:00 PM

0

Hi --

On Thu, 19 Jun 2008, Cthulhu __ wrote:

> Sot he code as taken from Chris' post returns this error:
>
> read_data.rb:72:in `[]=': no implicit conversion from nil to integer
> (TypeError)
> from read_data.rb:72
> from read_data.rb:69:in `foreach'
> from read_data.rb:69
>
> The code looks like this with line numbers...
>
> 69 File.foreach files_to_parse[i].to_s do |line|
> 70 next if line.strip == ''
> 71 idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
> 72 raw_data[idx] = dat
> 73 end
> 74 puts raw_data
>
> I suspect it may be a problem in the formatting of the data... perhaps
> there is some way to remove any non-integer characters eg. delimiting
> chars before parsing?

The scan(/\d+/) will scan for digits, and will ignore everything else,
so you don't have to pre-treat the lines.

I would throw in:

puts line if idx.nil?

before line 72, and see which lines are giving you the problems. All
the lines you showed in your sample data were either blank or
contained only digits, so they shouldn't cause this problem.


David

--
Rails training from David A. Black and Ruby Power and Light:
ADVANCING WITH RAILS June 16-19 Berlin
ADVANCING WITH RAILS July 21-24 Edison, NJ
See http://www.r... for details and updates!