Asp Forum - reading a specific column from file

cesco

1/11/2008 12:15:00 PM

Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?

Thanks and regards
Francesco

8 Answers

A.T.Hofkamp

1/11/2008 12:18:00 PM

On 2008-01-11, cesco <fd.calabrese@gmail.com> wrote:
> Hi,
>
> I have a file containing four columns of data separated by tabs (\t)
> and I'd like to read a specific column from it (say the third). Is
> there any simple way to do this in Python?
>
> I've found quite interesting the linecache module but unfortunately
> that is (to my knowledge) only working on lines, not columns.
>
> Any suggestion?

the csv module may do what you want.

Fredrik Lundh

1/11/2008 12:28:00 PM

cesco wrote:

> I have a file containing four columns of data separated by tabs (\t)
> and I'd like to read a specific column from it (say the third). Is
> there any simple way to do this in Python?

use the "split" method and plain old indexing:

for line in open("file.txt"):
columns = line.split("\t")
print columns[2] # indexing starts at zero

also see the "csv" module, which can read all sorts of
comma/semicolon/tab-separated spreadsheet-style files.

> I've found quite interesting the linecache module

the "linecache" module seems to be quite popular on comp.lang.python
these days, but it's designed for a very specific purpose (displaying
Python code in tracebacks), and is a really lousy way to read text files
in the general case. please unlearn.

</F>

Chris

1/11/2008 12:28:00 PM

On Jan 11, 2:15 pm, cesco <fd.calabr...@gmail.com> wrote:
> Hi,
>
> I have a file containing four columns of data separated by tabs (\t)
> and I'd like to read a specific column from it (say the third). Is
> there any simple way to do this in Python?
>
> I've found quite interesting the linecache module but unfortunately
> that is (to my knowledge) only working on lines, not columns.
>
> Any suggestion?
>
> Thanks and regards
> Francesco

for (i, each_line) in enumerate(open('input_file.txt','rb')):
try:
column_3 = each_line.split('\t')[2].strip()
except IndexError:
print 'Not enough columns on line %i of file.' % (i+1)
continue

do_something_with_column_3()

Peter Otten

1/11/2008 12:33:00 PM

A.T.Hofkamp wrote:

> On 2008-01-11, cesco <fd.calabrese@gmail.com> wrote:
>> Hi,
>>
>> I have a file containing four columns of data separated by tabs (\t)
>> and I'd like to read a specific column from it (say the third). Is
>> there any simple way to do this in Python?
>>
>> I've found quite interesting the linecache module but unfortunately
>> that is (to my knowledge) only working on lines, not columns.
>>
>> Any suggestion?
>
> the csv module may do what you want.

Here's an example:

>>> print open("tmp.csv").read()
alpha beta gamma delta
one two three for

>>> records = csv.reader(open("tmp.csv"), delimiter="\t")
>>> [record[2] for record in records]
['gamma', 'three']

Peter

Ivan Novick

1/11/2008 5:46:00 PM

On Jan 11, 4:15 am, cesco <fd.calabr...@gmail.com> wrote:
> Hi,
>
> I have a file containing four columns of data separated by tabs (\t)
> and I'd like to read a specific column from it (say the third). Is
> there any simple way to do this in Python?

You say you would like to "read" a specific column. I wonder if you
meant read all the data and then just seperate out the 3rd column or
if you really mean only do disk IO for the 3rd column of data and
thereby making your read faster. The second seems more interesting
but much harder and I wonder if any one has any ideas. As for the
just filtering out the third column, you have been given many
suggestions already.

Regards,
Ivan Novick
http://www....

Reedick, Andrew

1/11/2008 6:01:00 PM

> -----Original Message-----
> From: python-list-bounces+jr9445=att.com@python.org [mailto:python-
> list-bounces+jr9445=att.com@python.org] On Behalf Of Ivan Novick
> Sent: Friday, January 11, 2008 12:46 PM
> To: python-list@python.org
> Subject: Re: reading a specific column from file
>
>
> You say you would like to "read" a specific column. I wonder if you
> meant read all the data and then just seperate out the 3rd column or
> if you really mean only do disk IO for the 3rd column of data and
> thereby making your read faster. The second seems more interesting
> but much harder and I wonder if any one has any ideas.

Do what databases do. If the columns are stored with a fixed size on
disk, then you can simply compute the offset and seek to it. If the
columns are of variable size, then you need to store (and maintain) the
offsets in some kind of index.

*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA623

Hai Vu

1/17/2008 9:47:00 AM

Here is another suggestion:

col = 2 # third column
filename = '4columns.txt'
third_column = [line[:-1].split('\t')[col] for line in open(filename,
'r')]

third_column now contains a list of items in the third column.

This solution is great for small files (up to a couple of thousand of
lines). For larger file, performance could be a problem, so you might
need a different solution.

John Machin

1/17/2008 11:29:00 AM

On Jan 17, 8:47 pm, Hai Vu <wuh...@gmail.com> wrote:
> Here is another suggestion:
>
> col = 2 # third column
> filename = '4columns.txt'
> third_column = [line[:-1].split('\t')[col] for line in open(filename,
> 'r')]
>
> third_column now contains a list of items in the third column.
>
> This solution is great for small files (up to a couple of thousand of
> lines). For larger file, performance could be a problem, so you might
> need a different solution.

Using the maxsplit arg could speed it up a little:

line[:-1].split('\t', col+1)[col]

comp.lang.python

reading a specific column from file