[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Newbie with a project.(Parse tabed file and gen statistics?

Aces Ace

12/11/2006 7:00:00 PM

I have tab delimited files that I want to parse and generate statistics
with. I can parse the file into an array using IO.readlines but don't
know what good that does for sorting, and the other path I was following
was this.

class MyNewClass
File.open("/home/user/testdata") do |openfile|
openfile.each do |iterationshere|
status, keyword, location, state, zip, date, resultcount,
searchcount = iterationshere.chomp.split(/\t+/)
puts "keyword: #{keyword} status: #{status}, location: #{location},
state: #{state}, zip: #{zip}, date: #{date}, resultcount:
#{resultcount}, searchcount :#{searchcount}"
end
end
end


This obviously gos through the file line by line and splits on the tab
character and assigns the the variables status keyword etc then prints
that line.
But how do I put those into a hash and sort on the fields. Eventually I
will need this to go into a database, but I want to take this one step
at a time.
Thanks for helping this newbie. I have been using Perl for a while
(still consider myself newbie with it), but have decided to get more
into web development using the Rails framework and noticed Ruby had
regexp support built in. WOOT.

--
Posted via http://www.ruby-....

4 Answers

Paul Lutus

12/11/2006 8:01:00 PM

0

Aces Ace wrote:

> I have tab delimited files that I want to parse and generate statistics
> with.

First, and most important. Is the data table well-behaved, that is, are
there only tabs between fields, not anywhere else, and are there only
linefeeds at the ends of records, not anywhere else?

I ask because if these statement are true, it is very easy to break such a
file up into records and fields. See my code sample below.

/ ...

> This obviously gos through the file line by line and splits on the tab
> character and assigns the the variables status keyword etc then prints
> that line.
> But how do I put those into a hash and sort on the fields.

I don't think you necessarily want to put the data into a hash.

> Eventually I
> will need this to go into a database, but I want to take this one step
> at a time.

Experiment with this program:

----------------------------

#!/usr/bin/ruby -w

data = File.read("data.tsv")

database = []

data.each do |line|
record = []
line.split(/\t/).each do |field|
record << field
end
database << record
end

database = database.sort { |a,b| b[4].to_i <=> a[4].to_i }

puts database

----------------------------

I have a table of DVD movie titles that happens to be in the data format you
are speaking of, and I used it to test this code sample. This code can
create the required data array, and then it can sort on any chosen field.
Further, if the chosen field happens to contain an integer instead of a
string, I can sort numerically rather than by way of default string
ordering, as in the above example. Notice about the example that I chose to
sort the data in reverse.

If you have any questions about this example, please post again.

--
Paul Lutus
http://www.ara...

Paul Lutus

12/11/2006 8:10:00 PM

0

Paul Lutus wrote:

/ ...

> Experiment with this program:

A hasty correction:

-----------------------------

#!/usr/bin/ruby -w

data = File.read("data.tsv")

database = []

data.each do |line|
database << line.split(/\t/)
end

database = database.sort { |a,b| b[4].to_i <=> a[4].to_i }

puts database

-----------------------------

--
Paul Lutus
http://www.ara...

William James

12/11/2006 9:24:00 PM

0

Paul Lutus wrote:
> Paul Lutus wrote:
>
> / ...
>
> > Experiment with this program:
>
> A hasty correction:
>
> -----------------------------
>
> #!/usr/bin/ruby -w
>
> data = File.read("data.tsv")
>
> database = []
>
> data.each do |line|
> database << line.split(/\t/)
> end
>
> database = database.sort { |a,b| b[4].to_i <=> a[4].to_i }
>
> puts database
>
> -----------------------------
>
> --
> Paul Lutus
> http://www.ara...

p IO.readlines('junk').map{|s| s.chomp.split("\t")}.sort_by{|a|
a[4].to_i}

Paul Lutus

12/11/2006 11:24:00 PM

0

William James wrote:

/ ...

> p IO.readlines('junk').map{|s| s.chomp.split("\t")}.sort_by{|a|
> a[4].to_i}

Where's my caddy? :)

One of my goals was that the OP would understand my code at a glance.

--
Paul Lutus
http://www.ara...