[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

use fileinput to read a specific line

Joe Chiang

1/8/2008 3:16:00 AM

hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Fileinput('sample.txt')
????

14 Answers

Russ P.

1/8/2008 4:11:00 AM

0

On Jan 7, 7:15 pm, jo3c <JO3chi...@gmail.com> wrote:
> hi everybody
> im a newbie in python
> i need to read line 4 from a header file
> using linecache will crash my computer due to memory loading, because
> i am working on 2000 files each is 8mb
>
> fileinput don't load the file into memory first
> how do i use fileinput module to read a specific line from a file?
>
> for line in fileinput.Fileinput('sample.txt')
> ????

Assuming it's a text file, you could use something like this:

lnum = 0 # line number

for line in file("sample.txt"):
lnum += 1
if lnum >= 4: break

The variable "line" should end up with the contents of line 4 if I am
not mistaken. To handle multiple files, just wrap that code like this:

for file0 in files:

lnum = 0 # line number

for line in file(file0):
lnum += 1
if lnum >= 4: break

# do something with "line"

where "files" is a list of the files to be read.

That's not tested.

Dennis Lee Bieber

1/8/2008 5:41:00 AM

0

On Mon, 7 Jan 2008 20:10:58 -0800 (PST), "Russ P."
<Russ.Paielli@gmail.com> declaimed the following in comp.lang.python:

> for file0 in files:
>
> lnum = 0 # line number
>
> for line in file(file0):
> lnum += 1
> if lnum >= 4: break
>
> # do something with "line"
>
> where "files" is a list of the files to be read.
>
Given that the OP is talking 2000 files to be processed, I think I'd
recommend explicit open() and close() calls to avoid having lots of I/O
structures floating around...

for fid in file_list:
fin = open(fid)
jnk = fin.readline()
jnk = fin.readline()
jnk = fin.readline()
ln = fin.readline()
fin.close()

Yes, coding three junk reads does mean maintenance will be a pain
(we now need the 5th line, not the fourth -- and would need to add
another jnk = line)... I'd maybe consider replacing all four readline()
with:

for cnt in xrange(4):
ln = fin.readline()

since it doesn't need the overhead of a separate line counter/test and
will leave the fourth input line in "ln" on exit.
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfraed@ix.netcom.com wulfraed@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-asst@bestiaria.com)
HTTP://www.bestiaria.com/

Russ P.

1/8/2008 6:09:00 AM

0


> Given that the OP is talking 2000 files to be processed, I think I'd
> recommend explicit open() and close() calls to avoid having lots of I/O
> structures floating around...

Good point. I didn't think of that. It could also be done as follows:

for fileN in files:

lnum = 0 # line number
input = file(fileN)

for line in input:
lnum += 1
if lnum >= 4: break

input.close()

# do something with "line"

Six of one or half a dozen of the other, I suppose.

Russ P.

1/8/2008 6:17:00 AM

0

On Jan 7, 9:41 pm, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> On Mon, 7 Jan 2008 20:10:58 -0800 (PST), "Russ P."
> <Russ.Paie...@gmail.com> declaimed the following in comp.lang.python:
>
> > for file0 in files:
>
> > lnum = 0 # line number
>
> > for line in file(file0):
> > lnum += 1
> > if lnum >= 4: break
>
> > # do something with "line"
>
> > where "files" is a list of the files to be read.
>
> Given that the OP is talking 2000 files to be processed, I think I'd
> recommend explicit open() and close() calls to avoid having lots of I/O
> structures floating around...
>
> for fid in file_list:
> fin = open(fid)
> jnk = fin.readline()
> jnk = fin.readline()
> jnk = fin.readline()
> ln = fin.readline()
> fin.close()
>
> Yes, coding three junk reads does mean maintenance will be a pain
> (we now need the 5th line, not the fourth -- and would need to add
> another jnk = line)... I'd maybe consider replacing all four readline()
> with:
>
> for cnt in xrange(4):
> ln = fin.readline()
>
> since it doesn't need the overhead of a separate line counter/test and
> will leave the fourth input line in "ln" on exit.
> --
> Wulfraed Dennis Lee Bieber KD6MOG
> wlfr...@ix.netcom.com wulfr...@bestiaria.com
> HTTP://wlfraed.home.netcom.com/
> (Bestiaria Support Staff: web-a...@bestiaria.com)
> HTTP://www.bestiaria.com/

One second thought, I wonder if the reference counting mechanism would
be "smart" enough to automatically close the previous file on each
iteration of the outer loop. If so, the files don't need to be
explicitly closed.

Joe Chiang

1/8/2008 7:02:00 AM

0

On Jan 8, 2:08 pm, "Russ P." <Russ.Paie...@gmail.com> wrote:
> > Given that the OP is talking 2000 files to be processed, I think I'd
> > recommend explicit open() and close() calls to avoid having lots of I/O
> > structures floating around...
>
> Good point. I didn't think of that. It could also be done as follows:
>
> for fileN in files:
>
> lnum = 0 # line number
> input = file(fileN)
>
> for line in input:
> lnum += 1
> if lnum >= 4: break
>
> input.close()
>
> # do something with "line"
>
> Six of one or half a dozen of the other, I suppose.

this is what i did using glob

import glob
for files in glob.glob('/*.txt'):
x = open(files)
x.readline()
x.readline()
x.readline()
y = x.readline()
# do something with y
x.close()

Dennis Lee Bieber

1/8/2008 8:50:00 AM

0

On Mon, 7 Jan 2008 22:16:56 -0800 (PST), "Russ P."
<Russ.Paielli@gmail.com> declaimed the following in comp.lang.python:


> One second thought, I wonder if the reference counting mechanism would
> be "smart" enough to automatically close the previous file on each
> iteration of the outer loop. If so, the files don't need to be
> explicitly closed.

Hard to tell... Eventually I'd expect them to fade away, but I'm a
bit old-school... Explicit file control seems better than implied.
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfraed@ix.netcom.com wulfraed@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-asst@bestiaria.com)
HTTP://www.bestiaria.com/

Martin Marcher

1/8/2008 10:05:00 AM

0

jo3c wrote:

> i need to read line 4 from a header file

http://docs.python.org/lib/module-line...

~/2delete $ cat data.txt
L1
L2
L3
L4

~/2delete $ python
Python 2.5.1 (r251:54863, May 2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import linecache
>>> linecache.getline("data.txt", 2)
'L2\n'
>>> linecache.getline("data.txt", 5)
''
>>> linecache.getline("data.txt", 1)
'L1\n'
>>>


--
http://noneisyours.ma...
http://feeds.feedburner.com/N...

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Fredrik Lundh

1/8/2008 10:57:00 AM

0

Martin Marcher wrote:

>> i need to read line 4 from a header file
>
> http://docs.python.org/lib/module-line...

I guess you missed the "using linecache will crash my computer due to
memory loading, because i am working on 2000 files each is 8mb" part.

</F>

Fredrik Lundh

1/8/2008 11:00:00 AM

0

jo3c wrote:

> hi everybody
> im a newbie in python
> i need to read line 4 from a header file
> using linecache will crash my computer due to memory loading, because
> i am working on 2000 files each is 8mb
>
> fileinput don't load the file into memory first
> how do i use fileinput module to read a specific line from a file?
>
> for line in fileinput.Fileinput('sample.txt')
> ????

I could have sworn that I posted working code (including an explanation
why linecache wouldn't work) the last time you asked about this... yes,
here it is again:

> i have a 2000 files with header and data
> i need to get the date information from the header
> then insert it into my database
> i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt')
> to get the date on line 4 in the txt file i use
> linecache.getline('/mydata/myfile.txt/, 4)
>
> but if i use
> linecache.getline('glob.glob('/mydata/*/*/*.txt', 4) won't work

glob.glob returns a list of filenames, so you need to call getline once
for each file in the list.

but using linecache is absolutely the wrong tool for this; it's designed
for *repeated* access to arbitrary lines in a file, so it keeps all the
data in memory. that is, all the lines, for all 2000 files.

if the files are small, and you want to keep the code short, it's easier
to just grab the file's content and using indexing on the resulting list:

for filename in glob.glob('/mydata/*/*/*.txt'):
line = list(open(filename))[4-1]
... do something with line ...

(note that line numbers usually start with 1, but Python's list indexing
starts at 0).

if the files might be large, use something like this instead:

for filename in glob.glob('/mydata/*/*/*.txt'):
f = open(filename)
# skip first three lines
f.readline(); f.readline(); f.readline()
# grab the line we want
line = f.readline()
... do something with line ...

</F>

Steven D'Aprano

1/8/2008 1:12:00 PM

0

On Mon, 07 Jan 2008 22:16:56 -0800, Russ P. wrote:

> One second thought, I wonder if the reference counting mechanism would
> be "smart" enough to automatically close the previous file on each
> iteration of the outer loop. If so, the files don't need to be
> explicitly closed.

Python guarantees[1] that files will be closed, but doesn't specify when
they will be closed. I understand that Jython doesn't automatically close
files until the program terminates, so even if you could rely on the ref
counter to close the files in CPython, it won't be safe to do so in
Jython. I don't know about IronPython or PyPy or the semi-mythical Parrot.

Given how little effort it is to explicitly close the files yourself, I
don't see any reason to not close them, rather than relying on an
implementation-dependent feature.



[1] Guarantee void under any circumstance that prevents files from being
closed.

--
Steven