Asp Forum - CSV Reader - comp.lang.python

michael.pearmain

2/11/2008 3:36:00 PM

Hi All,

I want to read in a CSV file, but then write out a new CSV file from a
given line..

I'm using the CSV reader and have the the line where i want to start
writing the new file from begins with
"Transaction ID",

i thought it should be something along the lines of below.. obvioulsy
this doesn't work but any help would be great.

import csv
f = file(working_CSV, 'rb')
new_data = 0 # a counter to find where the line starts with
"Transaction ID"
reader = csv.reader(f)
for data in reader:
read data file

write new CSV

Cheers

Mike

13 Answers

Larry Bates

2/11/2008 3:53:00 PM

Mike P wrote:
> Hi All,
>
> I want to read in a CSV file, but then write out a new CSV file from a
> given line..
>
> I'm using the CSV reader and have the the line where i want to start
> writing the new file from begins with
> "Transaction ID",
>
> i thought it should be something along the lines of below.. obvioulsy
> this doesn't work but any help would be great.
>
> import csv
> f = file(working_CSV, 'rb')
> new_data = 0 # a counter to find where the line starts with
> "Transaction ID"
> reader = csv.reader(f)
> for data in reader:
> read data file
>
> write new CSV
>
> Cheers
>
> Mike
>
What part "obviously" doesn't work? Try something, post any tracebacks and we
will try to help. Don't ask others to write your code for you without actually
trying it yourself. It appears you are on the right track.

-Larry

michael.pearmain

2/11/2008 4:10:00 PM

Hi Larry,

i'm still getting to grips with python, but rest assured i thinkn it's
better for me to write hte code for learnign purposes

My basic file is here, it comes up with a syntax error on the
startswith line, is this because it is potentially a list?
My idea was to get the lines number where i can see Transaction ID and
then write out everything from this point into a new datafile.

Would a better solution be just to use readlines and search for the
string with a counter and then write out a file from there?

Any help is greatly appreciated

Mike

working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
DFAExposureToConversionQueryTool.csv"

import csv
f = file(working_CSV, 'rb')
reader = csv.reader(f)
CSV_lines = ""
for data in reader:
if lines.startswith("Transaction ID")
append.reader.line_num()
# there will only be 1 instance of this title at the start of the CSV
file
writer(Working_csv.csv[, dialect='excel'][, fmtparam])

Reedick, Andrew

2/11/2008 4:24:00 PM

> -----Original Message-----
> From: python-list-bounces+jr9445=att.com@python.org [mailto:python-
> list-bounces+jr9445=att.com@python.org] On Behalf Of Mike P
> Sent: Monday, February 11, 2008 11:10 AM
> To: python-list@python.org
> Subject: Re: CSV Reader
>
> Hi Larry,
>
> i'm still getting to grips with python, but rest assured i thinkn it's
> better for me to write hte code for learnign purposes
>
> My basic file is here, it comes up with a syntax error on the
> startswith line, is this because it is potentially a list?
> My idea was to get the lines number where i can see Transaction ID and
> then write out everything from this point into a new datafile.
>
>

>From the docs for reader: "All data read are returned as strings. No
automatic data type conversion is performed."
Just use print or repr() to see what the row data looks. Then the
method to check for 'transaction id' should be abundantly clear.

for data in reader:
print data
print repr(data)

> Would a better solution be just to use readlines and search for the
> string with a counter and then write out a file from there?

Yes you could, but the danger is that you get an insanely large file
that blows out your memory or causes the process to swap to disk space
(disk is slooooooooow.)
Just loop through the lines and use a boolean flag to determine when to
start printing.

*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621

michael.pearmain

2/11/2008 4:42:00 PM

Cheers for the help, the second way looked to be the best in the end,
and thanks for the boolean idea

Mike

working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
DFAExposureToConversionQueryTool.csv"

save_file = open("//filer/common/technical/Research/E2C/Template_CSV/
CSV_Data2.csv","w")

CSV_Data = open(working_CSV)
data = CSV_Data.readlines()
flag=False
for record in data:
if record.startswith('"Transaction ID"'):
flag=True
if flag:
save_file.write(record)
save_file.close()

Reedick, Andrew

2/11/2008 5:03:00 PM

> -----Original Message-----
> From: python-list-bounces+jr9445=att.com@python.org [mailto:python-
> list-bounces+jr9445=att.com@python.org] On Behalf Of Mike P
> Sent: Monday, February 11, 2008 11:42 AM
> To: python-list@python.org
> Subject: Re: CSV Reader
>
> Cheers for the help, the second way looked to be the best in the end,
> and thanks for the boolean idea
>
> Mike
>
>
>
> working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
> DFAExposureToConversionQueryTool.csv"
>
> save_file = open("//filer/common/technical/Research/E2C/Template_CSV/
> CSV_Data2.csv","w")
>
> CSV_Data = open(working_CSV)
> data = CSV_Data.readlines()
> flag=False
> for record in data:
> if record.startswith('"Transaction ID"'):
> flag=True
> if flag:
> save_file.write(record)
> save_file.close()

Don't be a pansy.

Use the csv module, or add a check for
record.startswith('TransactionID'). There's no guarantee that csv
columns will be double-quoted. (Leading whitespace may or may not be
acceptable, too.) Look at the first piece of sample code in the
documentation for the csv module. (Section 9.1.5 in python 2.5) You're
99% of the way to using csv.reader() properly.

Nitpick: If the boolean check is expensive, then
if not flag and record.startswith(...):
flag = true

Nitpick: flag is a weak name. Use something like bPrint, okay2print,
or print_now or anything that's more descriptive. In larger and/or more
complex programs, meaningful variable names are a must.

*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA622

Gabriel Genellina

2/11/2008 5:54:00 PM

En Mon, 11 Feb 2008 14:41:54 -0200, Mike P
<michael.pearmain@tangozebra.com> escribiï¿½:

> CSV_Data = open(working_CSV)
> data = CSV_Data.readlines()
> flag=False
> for record in data:
> if record.startswith('"Transaction ID"'):
> [...]

Files are already iterable by lines. There is no need to use readlines(),
and you can avoid the already menctioned potential slowdown problem. Just
remove the data=CSV_data.readlines() line, and change that for statement
to be:
for record in CSV_Data:

Reading the style guide may be beneficial:
http://www.python.org/dev/peps...

--
Gabriel Genellina

michael.pearmain

2/12/2008 10:22:00 AM

I did just try to post, but it doesn't look like it has appeared?

I've used your advice Andrew and tried to use the CSV module, but now
it doesn't seem to pick up the startswith command?
Is this because of the way the CSV module is reading the data in?
I've looked into the module description but i can't find anything that
i hould be using?

Can anyone offer an advice?

Cheers again

Mike

working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
DFAExposureToConversionQueryTool.csv"

save_file = "//filer/common/technical/Research/E2C/Template_CSV/
CSV_Data2.csv"

start_line=False
import csv
reader = csv.reader(open(working_CSV, "rb"))
writer = csv.writer(open(save_file, "wb"))
for row in reader:
if not start_line and record.startswith("'Transaction ID'"):
start_line=True
if start_line:
print row
writer.writerows(rows)
#writer.close()

michael.pearmain

2/12/2008 10:37:00 AM

just saw i needed to change record.startswith to row.startswith
but i get hte following traceback error

Traceback (most recent call last):
File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "Y:\technical\Research\E2C\Template_CSV\import CSV test.py",
line 10, in <module>
if not start_line and row.startswith('Transaction ID'):
AttributeError: 'list' object has no attribute 'startswith'

Chris

2/12/2008 11:24:00 AM

On Feb 12, 12:21 pm, Mike P <michael.pearm...@tangozebra.com> wrote:
> I did just try to post, but it doesn't look like it has appeared?
>
> I've used your advice Andrew and tried to use the CSV module, but now
> it doesn't seem to pick up the startswith command?
> Is this because of the way the CSV module is reading the data in?
> I've looked into the module description but i can't find anything that
> i hould be using?
>
> Can anyone offer an advice?
>
> Cheers again
>
> Mike
>
> working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
> DFAExposureToConversionQueryTool.csv"
>
> save_file = "//filer/common/technical/Research/E2C/Template_CSV/
> CSV_Data2.csv"
>
> start_line=False
> import csv
> reader = csv.reader(open(working_CSV, "rb"))
> writer = csv.writer(open(save_file, "wb"))
> for row in reader:
> if not start_line and record.startswith("'Transaction ID'"):
> start_line=True
> if start_line:
> print row
> writer.writerows(rows)
> #writer.close()

record won't have an attribute 'startswith' because record is a list
and startswith is a function of a string.
Also, your code isn't exactly clear on what you want to do, if it is
just "Find the first occurence of Transaction ID and pump the file
from then onwards into a new file" why not

output = open('output_file.csv','wb')
start_line = False
for each_line in open('input_file.csv','rb'):
if not start_line and each_line.startswith("'Transaction ID'"):
start_line = True
if start_line:
output.write( each_line )
output.close()

also, if you need a line number for any purposes, take a look at
enumerate() and with that it will return a counter and your data, for
eg. 'for (line_num, each_line) in enumerate(input_file):'. Counting
starts @ zero though so you would need to add 1.

michael.pearmain

2/12/2008 1:53:00 PM

Hi Chris that's exactley what i wanted to do,

Many thanks

comp.lang.python

CSV Reader

michael.pearmain

Larry Bates

michael.pearmain

Reedick, Andrew

michael.pearmain

Reedick, Andrew

Gabriel Genellina

michael.pearmain

michael.pearmain

Chris

michael.pearmain

x Login to ForumsZone