Asp Forum - Parse csv similar file

Rebhan, Gilbert

2/6/2007 2:32:00 PM

Hi,

<newbie>

i have a txtfile with a format like that =

AP850KP;INCLIB;E023889;AP013;240107;0730
AP850SD$;INCLIB;E052337;AP013;240107;0730
AP850SDA;INCLIB;E050441;AP013;240107;0730
AP850SDI;INCLIB;E023889;AP013;240107;0730
AP850SDO;INCLIB;E052337;AP013;240107;0730
AP850SDS;INCLIB;E050441;AP013;240107;0730
...

i want to get a collection for every E followed by digits,
so with the example above, i want to get =

collections:
E023889
E052337
E050441
...

each collection should contain datasets with the rest of the line, so
f.e.
E023889 would have =

[AP850KP;INCLIB;AP013;240107;0730,AP850SDI;AP013;240107;0730]

questions=
what kind of collection is the best ? is an array sufficient ?

right now i have =

efas=Array.new
File.open("mycsvfile", "r").each do |line|
if line =~ /(\w+.?);(\w+);(\w+);(\w+);(\w+);(\w+)/

efas<<$3.to_s<<',' unless efas.include?($3.to_s)

end
end
puts efas.to_s.chop

So i have all Ed\+, but how to get further ?

Are there better ways as regular expressions ?
Any ideas ?

<newbie/>

Regards, Gilbert

16 Answers

Brian Candler

2/6/2007 2:37:00 PM

On Tue, Feb 06, 2007 at 11:32:27PM +0900, Rebhan, Gilbert wrote:
> questions=
> what kind of collection is the best ? is an array sufficient ?

Depends what you want to do with it. If you want to be able to find an entry
E123456 quickly, then you'd use a hash. If you want to keep only the
first/last entry for a particular key (as it seems you do), using a hash
speeds things up here too.

> right now i have =
>
> efas=Array.new
> File.open("mycsvfile", "r").each do |line|
> if line =~ /(\w+.?);(\w+);(\w+);(\w+);(\w+);(\w+)/
>
> efas<<$3.to_s<<',' unless efas.include?($3.to_s)
>
> end
> end
> puts efas.to_s.chop

Try:

efas = Hash.new
...
efas[$3] = [$1,$2,$4,$5,$6] unless efas.has_key?($3)
...
puts efas.inspect

> Are there better ways as regular expressions ?

You could look at String#split instead

HTH,

Brian.

Rebhan, Gilbert

2/6/2007 2:55:00 PM

Hi,

-----Original Message-----
From: Brian Candler [mailto:B.Candler@pobox.com]
Sent: Tuesday, February 06, 2007 3:37 PM
To: ruby-talk ML
Subject: Re: Parse csv similar file

> what kind of collection is the best ? is an array sufficient ?
/*
Depends what you want to do with it. If you want to be able to find an
entry
E123456 quickly, then you'd use a hash. If you want to keep only the
first/last entry for a particular key (as it seems you do), using a hash
speeds things up here too.
*/

i don't need to find all entries E..... , but collect all datas
that belong to the different E.....

i want a collection for every E... that occurs, with all the lines
(except the E... itself) that contain that E in it

/*
Try:

efas = Hash.new
...
efas[$3] = [$1,$2,$4,$5,$6] unless efas.has_key?($3)
...
puts efas.inspect
*/

that gives me only one dataset in the hash, but there are more
entries that have E123456 in it.

Regards, Gilbert

Brian Candler

2/6/2007 3:13:00 PM

On Tue, Feb 06, 2007 at 11:54:59PM +0900, Rebhan, Gilbert wrote:
> > what kind of collection is the best ? is an array sufficient ?
> /*
> Depends what you want to do with it. If you want to be able to find an
> entry
> E123456 quickly, then you'd use a hash. If you want to keep only the
> first/last entry for a particular key (as it seems you do), using a hash
> speeds things up here too.
> */
>
> i don't need to find all entries E..... , but collect all datas
> that belong to the different E.....
>
> i want a collection for every E... that occurs, with all the lines
> (except the E... itself) that contain that E in it
>
> /*
> Try:
>
> efas = Hash.new
> ...
> efas[$3] = [$1,$2,$4,$5,$6] unless efas.has_key?($3)
> ...
> puts efas.inspect
> */
>
> that gives me only one dataset in the hash, but there are more
> entries that have E123456 in it.

I was just following your original example, which only kept the first line
for a particular E key.

If you want to keep them all, then I'd use a hash with each element being an
array.

efas[$3] ||= [] # create empty array if necessary
efas[$3] << [$1,$2,$4,$5,$6] # add a new line

So, given the following input

aaa,bbb,E123,ddd,eee,fff
ggg,hhh,E123,iii,jjj,kkk

you should get

efas = {
"E123" => [
["aaa","bbb","ddd","eee","fff"],
["ggg","hhh","iii","jjj","kkk"],
],
}

puts efas["E123"].size # 2
puts efas["E123"][0][3] # "eee"
puts efas["E123"][1][3] # "jjj"

In practice, to make it easier to manipulate this data, you'd probably want
to create a class to represent each object, rather than using a 5-element
array.

You would give each attribute a sensible name. I don't know what these
values mean, so I've just called them a to e here.

class Myclass
attr_accessor :a, :b, :c, :d, :e
def initialize(a, b, c, d, e)
@a = a
@b = b
@c = c
@d = d
@e = e
end
end

...
efas[$3] ||= []
efas[$3] << Myclass.new($1,$2,$4,$5,$6)

HTH,

Brian.

Gavin Kistner

2/6/2007 3:29:00 PM

On Feb 6, 7:32 am, "Rebhan, Gilbert" <Gilbert.Reb...@huk-coburg.de>
wrote:
> i have a txtfile with a format like that =
>
> AP850KP;INCLIB;E023889;AP013;240107;0730
> AP850SD$;INCLIB;E052337;AP013;240107;0730
> AP850SDA;INCLIB;E050441;AP013;240107;0730
> AP850SDI;INCLIB;E023889;AP013;240107;0730
> AP850SDO;INCLIB;E052337;AP013;240107;0730
> AP850SDS;INCLIB;E050441;AP013;240107;0730
> ..
>
> i want to get a collection for every E followed by digits,
> so with the example above, i want to get =

lines = DATA.readlines.map{ |line|
line.chomp.split( ';' )
}
lookup = {}
lines.each{ |data|
key = data.find{ |value| /^E/ =~ value }
lookup[ key ] = data
}
p lookup[ "E050441" ]
#=> ["AP850SDS", "INCLIB", "E050441", "AP013", "240107", "0730"]
__END__
AP850KP;INCLIB;E023889;AP013;240107;0730
AP850SD$;INCLIB;E052337;AP013;240107;0730
AP850SDA;INCLIB;E050441;AP013;240107;0730
AP850SDI;INCLIB;E023889;AP013;240107;0730
AP850SDO;INCLIB;E052337;AP013;240107;0730
AP850SDS;INCLIB;E050441;AP013;240107;0730

Drew Olson

2/6/2007 3:36:00 PM

Gavin Kistner wrote:
> On Feb 6, 7:32 am, "Rebhan, Gilbert" <Gilbert.Reb...@huk-coburg.de>
> wrote:
>> i want to get a collection for every E followed by digits,
>> so with the example above, i want to get =
>
> lines = DATA.readlines.map{ |line|
> line.chomp.split( ';' )
> }
> lookup = {}
> lines.each{ |data|
> key = data.find{ |value| /^E/ =~ value }
> lookup[ key ] = data
> }
> p lookup[ "E050441" ]
> #=> ["AP850SDS", "INCLIB", "E050441", "AP013", "240107", "0730"]
> __END__
> AP850KP;INCLIB;E023889;AP013;240107;0730
> AP850SD$;INCLIB;E052337;AP013;240107;0730
> AP850SDA;INCLIB;E050441;AP013;240107;0730
> AP850SDI;INCLIB;E023889;AP013;240107;0730
> AP850SDO;INCLIB;E052337;AP013;240107;0730
> AP850SDS;INCLIB;E050441;AP013;240107;0730

I think he wants to append this array with information each time he sees
the same key, so modify your code like so:

lines = DATA.readlines.map{ |line|
line.chomp.split( ';' )
}
lookup = {}
lines.each{ |data|
key = data.find{ |value| /^E/ =~ value }
lookup[ key ] ||= []
lookup[ key ] << data
}

--
Posted via http://www.ruby-....

Gregory Brown

2/6/2007 3:52:00 PM

On 2/6/07, Rebhan, Gilbert <Gilbert.Rebhan@huk-coburg.de> wrote:
>
> Hi,
>
> <newbie>
>
> i have a txtfile with a format like that =
>
> AP850KP;INCLIB;E023889;AP013;240107;0730
> AP850SD$;INCLIB;E052337;AP013;240107;0730
> AP850SDA;INCLIB;E050441;AP013;240107;0730
> AP850SDI;INCLIB;E023889;AP013;240107;0730
> AP850SDO;INCLIB;E052337;AP013;240107;0730
> AP850SDS;INCLIB;E050441;AP013;240107;0730
> ...
>
> i want to get a collection for every E followed by digits,
> so with the example above, i want to get =
>
> collections:
> E023889
> E052337
> E050441
> ...
>
> each collection should contain datasets with the rest of the line, so
> f.e.
> E023889 would have =
>
> [AP850KP;INCLIB;AP013;240107;0730,AP850SDI;AP013;240107;0730]
>
> questions=
> what kind of collection is the best ? is an array sufficient ?

Just for fun, here's a Ruport example:

require "rubygems"
require "ruport"
DATA = <<-EOS
AP850KP;INCLIB;E023889;AP013;240107;0730
AP850SD$;INCLIB;E052337;AP013;240107;0730
AP850SDA;INCLIB;E050441;AP013;240107;0730
AP850SDI;INCLIB;E023889;AP013;240107;0730
AP850SDO;INCLIB;E052337;AP013;240107;0730
AP850SDS;INCLIB;E050441;AP013;240107;0730
EOS

table = Ruport::Data::Table.parse(DATA, :has_names => false,
:csv_options=>{:col_sep=>";"})

table.column_names = %w[c1 c2 c3 c4 c5 c6] # BUG! you shouldn't need colnames

e = table.column(2).uniq
e.each { |x| table.create_group(x) { |r| r[2].eql?(x) } }

groups = table.groups

>> groups.attributes
>> ["E023889", "E052337", "E050441"]

>> groups["E023889"].map { |r| r[0] }
>> ["AP850KP", "AP850SDI"]

>> groups.each { |t| p t[0].c1 }
"AP850KP"
"AP850SD$"
"AP850SDA"

===============

note that in making this example, I found a small bug in Ruport's
grouping support which I will fix :)

Gavin Kistner

2/6/2007 4:53:00 PM

On Feb 6, 8:36 am, Drew Olson <olso...@gmail.com> wrote:
> I think he wants to append this array with information each time he sees
> the same key, so modify your code like so:
>
> lines = DATA.readlines.map{ |line|
> line.chomp.split( ';' )}
>
> lookup = {}
> lines.each{ |data|
> key = data.find{ |value| /^E/ =~ value }
> lookup[ key ] ||= []
> lookup[ key ] << data
>
> }

Curses, I didn't read carefully enough. Right you are. (And, though
it's not clear from his example, he might not even need to split the
original line into arrays of pieces, but just keep the lines.)

Gavin Kistner

2/6/2007 4:59:00 PM

On Feb 6, 8:36 am, Drew Olson <olso...@gmail.com> wrote:
> I think he wants to append this array with information each time he sees
> the same key [...]

So here's another version:

lookup = Hash.new{ |h,k| h[k]=[] }

DATA.each_line{ |line|
line.chomp!
warn "No key in '#{line}'" unless key = line[ /\bE\w+/ ]
lookup[ key ] << line
}

p lookup[ "E050441" ]
#=> ["AP850SDA;INCLIB;E050441;AP013;240107;0730",
"AP850SDS;INCLIB;E050441;AP013;240107;0730"]

require 'pp'
pp lookup
#=> {"E050441"=>
#=> ["AP850SDA;INCLIB;E050441;AP013;240107;0730",
#=> "AP850SDS;INCLIB;E050441;AP013;240107;0730"],
#=> "E052337"=>
#=> ["AP850SD$;INCLIB;E052337;AP013;240107;0730",
#=> "AP850SDO;INCLIB;E052337;AP013;240107;0730"],
#=> "E023889"=>
#=> ["AP850KP;INCLIB;E023889;AP013;240107;0730",
#=> "AP850SDI;INCLIB;E023889;AP013;240107;0730"]}

__END__
AP850KP;INCLIB;E023889;AP013;240107;0730
AP850SD$;INCLIB;E052337;AP013;240107;0730
AP850SDA;INCLIB;E050441;AP013;240107;0730
AP850SDI;INCLIB;E023889;AP013;240107;0730
AP850SDO;INCLIB;E052337;AP013;240107;0730
AP850SDS;INCLIB;E050441;AP013;240107;0730

Rebhan, Gilbert

2/7/2007 8:47:00 AM

Hi,

-----Original Message-----
From: Phrogz [mailto:gavin@refinery.com]
Sent: Tuesday, February 06, 2007 6:00 PM
To: ruby-talk ML
Subject: Re: Parse csv similar file

On Feb 6, 8:36 am, Drew Olson <olso...@gmail.com> wrote:
> I think he wants to append this array with information each time he
sees
> the same key [...]

i still don't know how to go, so here some more notes ...

i get a folder

/timestamp
metafile.txt
/INCLIB
/PLI

metafile looks like that =
APLVZDT;INCLIB;E050441;AP013;240107;0730
AP400ER;INCLIB;E023889;AP013;240107;0730
AP540RBP;INCLIB;E052337;AP013;240107;0730
AP700PA;INCLIB;E050441;AP013;240107;0730
... more lines

field 1 is a filename
field 2 is a foldername, shows whether path is /INCLIB/file or /PLI/file
field 3 is a ticketnr
field 4 is a username
field 5 is a date
field 6 is a timestamp

i need to parse the metafile and =

1. create a folderstructure for every ticketnr that occurs, f.e.

/E050441
/INCLIB
/PLI

and put all the files that belong to that ticket
(means the line with the filename contains that ticketnr)
in the subfolder which is field 2

2. create a file in the root of the /ticketnr folder
which contains the rest of a dataset (line), means =

field 4
field 5
field 6

which are the same for every file with the same ticketnr

the format might look like

user=...
date=...
time=...

have to decide it later.

I thought with =

File.open("mycsvfile", "r").each do |line|
if line =~ /(\w+.?);(\w+);(\w+);(\w+);(\w+);(\w+)/

efas<<$3.to_s<<',' unless efas.include?($3.to_s)

i get an array with all ticketnr
then i create a folderstructure for every index in that array
and put the files in it, but i don't get it.

Any ideas ?

Regards, Gilbert

Brian Candler

2/7/2007 9:41:00 AM

On Wed, Feb 07, 2007 at 05:47:26PM +0900, Rebhan, Gilbert wrote:
> i get a folder
>
> /timestamp
> metafile.txt
> /INCLIB
> /PLI
>
> metafile looks like that =
> APLVZDT;INCLIB;E050441;AP013;240107;0730
> AP400ER;INCLIB;E023889;AP013;240107;0730
> AP540RBP;INCLIB;E052337;AP013;240107;0730
> AP700PA;INCLIB;E050441;AP013;240107;0730
> ... more lines
>
> field 1 is a filename
> field 2 is a foldername, shows whether path is /INCLIB/file or /PLI/file
> field 3 is a ticketnr
> field 4 is a username
> field 5 is a date
> field 6 is a timestamp
>
>
> i need to parse the metafile and =
>
> 1. create a folderstructure for every ticketnr that occurs, f.e.
>
> /E050441
> /INCLIB
> /PLI
>
> and put all the files that belong to that ticket
> (means the line with the filename contains that ticketnr)
> in the subfolder which is field 2
>
> 2. create a file in the root of the /ticketnr folder
> which contains the rest of a dataset (line), means =
>
> field 4
> field 5
> field 6
>
> which are the same for every file with the same ticketnr
>
> the format might look like
>
> user=...
> date=...
> time=...
>
> have to decide it later.
>
>
> I thought with =
>
> File.open("mycsvfile", "r").each do |line|
> if line =~ /(\w+.?);(\w+);(\w+);(\w+);(\w+);(\w+)/
>
> efas<<$3.to_s<<',' unless efas.include?($3.to_s)
>
> i get an array with all ticketnr
> then i create a folderstructure for every index in that array
> and put the files in it, but i don't get it.
>
> Any ideas ?

I'd do all the work on-the-fly. Untested code:

require 'fileutils'
SRCDIR="/path_to_src"
DSTDIR="/path_to_dst"

def copy_ticket(filename, folder, ticket, user, date, time)
srcdir = SRCDIR + File::SEPARATOR + folder
dstdir = DSTDIR + File::SEPARATOR + ticket + File::SEPARATOR + folder
FileUtils.mkdir_p(dstdir)
FileUtils.cp(srcdir + File::SEPARATOR + filename,
dstdir + File::SEPARATOR + filename)

# write out status file
statusfile = dstdir + File::SEPARATOR + "status.txt"
unless FileTest.exists?(statusfile)
File.open(statusfile, "w") do |sf|
sf.puts "user=#{user}"
sf.puts "date=#{date}"
sf.puts "time=#{time}"
end
end
end

def process_meta(f)
f.each_line do |line|
next unless line =~ /^(\w+);(\w+);(\w+);(\w+);(\w+);(\w+)$/
copy_ticket($1,$2,$3,$4,$5,$6)
end
end

# Main program
File.open("mycsvfile") do |f|
process_meta(f)
end

If you want to build up a hash of ticket IDs seen, you can do that in
process_meta as well. I'd pass in an empty hash, and update it in the
each_line loop.

HTH,

Brian.

comp.lang.ruby

Parse csv similar file

Rebhan, Gilbert

Brian Candler

Rebhan, Gilbert

Brian Candler

Gavin Kistner

Drew Olson

Gregory Brown

Gavin Kistner

Gavin Kistner

Rebhan, Gilbert

Brian Candler

x Login to ForumsZone