Asp Forum - grep a csv? - comp.lang.ruby

Michael Linfield

8/16/2007 3:31:00 AM

If i had a huge CSV file, and i wanted to pull out say all the lines
that contained the word "Blah1" and throw it in a temporary file what
would be the best approached. My thoughts were to use

require 'rubygems'
require 'ruport'
require 'ruport/util'

t=Ruport::Data:Table.load("filename.csv")
t.grep(/Blah1/)

### this sadly only returned an output of => []

any ideas?

Thanks!
--
Posted via http://www.ruby-....

25 Answers

Michael Glaesemann

8/16/2007 3:36:00 AM

On Aug 15, 2007, at 22:30 , Michael Linfield wrote:

> If i had a huge CSV file, and i wanted to pull out say all the lines
> that contained the word "Blah1" and throw it in a temporary file what
> would be the best approached.

<snip />

> any ideas?

My first thought was FasterCSV and go from there. Any reason not to
use a dedicated CSV lib?

Michael Glaesemann
grzm seespotcode net

Michael Linfield

8/16/2007 3:39:00 AM

> <snip />
>
>> any ideas?
>
> My first thought was FasterCSV and go from there. Any reason not to
> use a dedicated CSV lib?
>
> Michael Glaesemann
> grzm seespotcode net

im planning to use ruport to graph the data, but if u can integrate the
output from using FasterCSV into a ruport graph, im all ears :)
--
Posted via http://www.ruby-....

Chris Carter

8/16/2007 3:40:00 AM

On 8/15/07, Michael Linfield <globyy3000@hotmail.com> wrote:
> If i had a huge CSV file, and i wanted to pull out say all the lines
> that contained the word "Blah1" and throw it in a temporary file what
> would be the best approached. My thoughts were to use
>
> require 'rubygems'
> require 'ruport'
> require 'ruport/util'
>
> t=Ruport::Data:Table.load("filename.csv")
> t.grep(/Blah1/)
>
> ### this sadly only returned an output of => []
>
> any ideas?
>
> Thanks!
> --
> Posted via http://www.ruby-....
>
>

File.readlines('filename.csv').grep(/Blah1/)

--
Chris Carter
concentrationstudios.com
brynmawrcs.com

M. Edward (Ed) Borasky

8/16/2007 3:42:00 AM

Michael Linfield wrote:
> If i had a huge CSV file, and i wanted to pull out say all the lines
> that contained the word "Blah1" and throw it in a temporary file what
> would be the best approached. My thoughts were to use
>
> require 'rubygems'
> require 'ruport'
> require 'ruport/util'
>
> t=Ruport::Data:Table.load("filename.csv")
> t.grep(/Blah1/)
>
> ### this sadly only returned an output of => []
>
> any ideas?
>
> Thanks!

OK ... first of all, define "huge" and what are your restrictions? Let
me assume the worst case just to get started -- more than 256 columns
and more than 65536 rows and you're on Windows. :)

Seriously, though, if this is a *recurring* use case rather than a
one-shot "somebody gave me this *$&%^# file and wants an answer by 5 PM
tonight!" use case, I'd load it into a database (assuming your database
doesn't have a column count limitation larger than the column count in
your file, that is) and then hook up to it with DBI. But if it's a
one-shot deal and you've got a command line handy (Linux, MacOS, BSD or
Cygwin) just do "grep blah1 huge-file.csv > temp-file.csv". Bonus points
for being able to write that in Ruby and get it debugged before someone
who's been doing command-line for years types that one-liner in. :)

Chris Carter

8/16/2007 3:43:00 AM

On 8/15/07, Michael Linfield <globyy3000@hotmail.com> wrote:
> > <snip />
> >
> >> any ideas?
> >
> > My first thought was FasterCSV and go from there. Any reason not to
> > use a dedicated CSV lib?
> >
> > Michael Glaesemann
> > grzm seespotcode net
>
> im planning to use ruport to graph the data, but if u can integrate the
> output from using FasterCSV into a ruport graph, im all ears :)
> --
> Posted via http://www.ruby-....
>
>

Ruport uses FasterCSV for its CVS parsing.

--
Chris Carter
concentrationstudios.com
brynmawrcs.com

William James

8/16/2007 3:56:00 AM

M. Edward (Ed) Borasky wrote
> Michael Linfield wrote:
> > If i had a huge CSV file, and i wanted to pull out say all the lines
> > that contained the word "Blah1" and throw it in a temporary file what
> > would be the best approached. My thoughts were to use
>
> > require 'rubygems'
> > require 'ruport'
> > require 'ruport/util'
>
> > t=Ruport::Data:Table.load("filename.csv")
> > t.grep(/Blah1/)
>
> > ### this sadly only returned an output of => []
>
> > any ideas?
>
> > Thanks!
>
> OK ... first of all, define "huge" and what are your
> restrictions? Let me assume the worst case just to get
> started -- more than 256 columns and more than 65536 rows
> and you're on Windows. :)
>
> Seriously, though, if this is a *recurring* use case rather
> than a one-shot "somebody gave me this *$&%^# file and wants
> an answer by 5 PM tonight!" use case, I'd load it into a
> database (assuming your database doesn't have a column count
> limitation larger than the column count in your file, that
> is) and then hook up to it with DBI. But if it's a one-shot
> deal and you've got a command line handy (Linux, MacOS, BSD
> or Cygwin)

Windoze has a command-line.

> just do "grep blah1 huge-file.csv >
> temp-file.csv". Bonus points for being able to write that in
> Ruby and get it debugged before someone who's been doing
> command-line for years types that one-liner in. :)

Chris Carter has already done it. Have you ever posted
Ruby code here?

Michael Linfield

8/16/2007 4:08:00 AM

M. Edward (Ed) Borasky wrote:
> Michael Linfield wrote:
>>
>> ### this sadly only returned an output of => []
>>
>> any ideas?
>>
>> Thanks!
>
> OK ... first of all, define "huge" and what are your restrictions? Let
> me assume the worst case just to get started -- more than 256 columns
> and more than 65536 rows and you're on Windows. :)
>
> Seriously, though, if this is a *recurring* use case rather than a
> one-shot "somebody gave me this *$&%^# file and wants an answer by 5 PM
> tonight!" use case, I'd load it into a database (assuming your database
> doesn't have a column count limitation larger than the column count in
> your file, that is) and then hook up to it with DBI. But if it's a
> one-shot deal and you've got a command line handy (Linux, MacOS, BSD or
> Cygwin) just do "grep blah1 huge-file.csv > temp-file.csv". Bonus points
> for being able to write that in Ruby and get it debugged before someone
> who's been doing command-line for years types that one-liner in. :)

lol, alright lets say the senario will be in the range of 20k - 70k
lines of data. no more than 20 columns
and i wanna avoid using command line to do this, because yes in fact
this will be used to process more than one datafile which i hope to
setup in optparse to have a command line arg that directs the prog to
the file. also i wanted to for the meantime not have to throw it on any
database...avoiding DBI for the meanwhile. But an idea flew through my
head a few minutes ago....what if i did this --

res = []
res << File.readlines('filename.csv').grep(/Blah1/) #thanks chris

ran into a small demeaning problem. this shoves all that grep'd data
into 1 element lol... res[1] => nil ...its all shoved into res[0] id
hope to fix this with a simple do statement but a little confusion hit
me while doing that with a readline command. and by shoving this into an
array will i still be able to single out columns of data. if not then
how would i shove the grep data into a second csv file, doing this all
inside ruby of course, no command line program > output.csv :)
--
Posted via http://www.ruby-....

Alex Gutteridge

8/16/2007 4:20:00 AM

On 16 Aug 2007, at 13:08, Michael Linfield wrote:

> M. Edward (Ed) Borasky wrote:
>> Michael Linfield wrote:
>>>
>>> ### this sadly only returned an output of => []
>>>
>>> any ideas?
>>>
>>> Thanks!
>>
>> OK ... first of all, define "huge" and what are your restrictions?
>> Let
>> me assume the worst case just to get started -- more than 256 columns
>> and more than 65536 rows and you're on Windows. :)
>>
>> Seriously, though, if this is a *recurring* use case rather than a
>> one-shot "somebody gave me this *$&%^# file and wants an answer by
>> 5 PM
>> tonight!" use case, I'd load it into a database (assuming your
>> database
>> doesn't have a column count limitation larger than the column
>> count in
>> your file, that is) and then hook up to it with DBI. But if it's a
>> one-shot deal and you've got a command line handy (Linux, MacOS,
>> BSD or
>> Cygwin) just do "grep blah1 huge-file.csv > temp-file.csv". Bonus
>> points
>> for being able to write that in Ruby and get it debugged before
>> someone
>> who's been doing command-line for years types that one-liner in. :)
>
> lol, alright lets say the senario will be in the range of 20k - 70k
> lines of data. no more than 20 columns
> and i wanna avoid using command line to do this, because yes in fact
> this will be used to process more than one datafile which i hope to
> setup in optparse to have a command line arg that directs the prog to
> the file. also i wanted to for the meantime not have to throw it on
> any
> database...avoiding DBI for the meanwhile. But an idea flew through my
> head a few minutes ago....what if i did this --
>
> res = []
> res << File.readlines('filename.csv').grep(/Blah1/) #thanks chris

Array#<< appends the object onto your Array, you want to combine the
two arrays using Array#+:

irb(main):001:0> a = []
=> []
irb(main):002:0> a << [1,2,3]
=> [[1, 2, 3]]
irb(main):003:0> a = []
=> []
irb(main):004:0> a += [1,2,3]
=> [1, 2, 3]
irb(main):005:0>

Though why don't you just use:

res = File.readlines('filename.csv').grep(/Blah1/)

Alex Gutteridge

Bioinformatics Center
Kyoto University

Michael Linfield

8/16/2007 4:29:00 AM

Alex Gutteridge wrote:
> On 16 Aug 2007, at 13:08, Michael Linfield wrote:
>
> Though why don't you just use:
>
> res = File.readlines('filename.csv').grep(/Blah1/)
>
> Alex Gutteridge

can i push that into a file to temporarily use to pull all the Blah1
data from, then at the end of the program delete Blah1.csv ?

--
Posted via http://www.ruby-....

Alex Gutteridge

8/16/2007 4:58:00 AM

On 16 Aug 2007, at 13:29, Michael Linfield wrote:

> Alex Gutteridge wrote:
>> On 16 Aug 2007, at 13:08, Michael Linfield wrote:
>>
>> Though why don't you just use:
>>
>> res = File.readlines('filename.csv').grep(/Blah1/)
>>
>> Alex Gutteridge
>
> can i push that into a file to temporarily use to pull all the Blah1
> data from, then at the end of the program delete Blah1.csv ?

Sure, use tempfile, but I think botp has shown why you don't really
need the temporary file (unless there's part of this problem I'm not
understanding):

irb(main):001:0> puts File.readlines('filename.csv')
this, is , a , test, foo
this, is , a , test, bar
this, is , a , test, Blah1
this, is , a , test, bar
this, Blah, is , a , test
this, is , a , Blah, test
=> nil
irb(main):002:0> puts File.readlines('filename.csv').grep(/Blah1/)
this, is , a , test, Blah1
=> nil
irb(main):003:0> require 'tempfile'
=> true
irb(main):004:0> tf = Tempfile.new('csv')
=> #<File:/tmp/csv.1339.0>
irb(main):005:0> tf.puts File.readlines('filename.csv').grep(/Blah1/)
=> nil
irb(main):006:0> tf.close
=> nil
irb(main):007:0> tf.open
=> #<File:/tmp/csv.1339.0>
irb(main):008:0> puts tf.gets
this, is , a , test, Blah1
=> nil

Alex Gutteridge

Bioinformatics Center
Kyoto University

comp.lang.ruby

grep a csv?

Michael Linfield

Michael Glaesemann

Michael Linfield

Chris Carter

M. Edward (Ed) Borasky

Chris Carter

William James

Michael Linfield

Alex Gutteridge

Michael Linfield

Alex Gutteridge

x Login to ForumsZone