[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Reading a large bz2 textfile exits early

Norman Rieß

2/20/2010 10:13:00 PM

Hello,

i am trying to read a large bz2 compressed textfile using the bz2 module.
The file is 1717362770 lines long and 8GB large.
Using this code

source_file = bz2.BZ2File(file, "r")
for line in source_file:
print line.strip()

print "Exiting"
print "I used file: " + file

the loop exits cleanly after 4311 lines in midline and the prints are
executed.
This happened on two different boxes runnig different brands of linux.
Is there something i miss or should be done differently?

Thank you.

Regards,
Norman

7 Answers

Dennis Lee Bieber

2/21/2010 9:10:00 PM

0

On Sat, 20 Feb 2010 23:12:50 +0100, Norman Rieß <norman@smash-net.org>
declaimed the following in comp.lang.python:

> Hello,
>
> i am trying to read a large bz2 compressed textfile using the bz2 module.
> The file is 1717362770 lines long and 8GB large.
> Using this code
>
> source_file = bz2.BZ2File(file, "r")
> for line in source_file:
> print line.strip()
>
> print "Exiting"
> print "I used file: " + file
>
> the loop exits cleanly after 4311 lines in midline and the prints are
> executed.
> This happened on two different boxes runnig different brands of linux.
> Is there something i miss or should be done differently?
>
Please verify your indentation! What you posted above is invalid in
many ways.
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/

Norman Rieß

2/22/2010 6:50:00 AM

0

Am 02/21/10 22:09, schrieb Dennis Lee Bieber:
> On Sat, 20 Feb 2010 23:12:50 +0100, Norman Rieß<norman@smash-net.org>
> declaimed the following in comp.lang.python:
>
>
>> Hello,
>>
>> i am trying to read a large bz2 compressed textfile using the bz2 module.
>> The file is 1717362770 lines long and 8GB large.
>> Using this code
>>
>> source_file = bz2.BZ2File(file, "r")
>> for line in source_file:
>> print line.strip()
>>
>> print "Exiting"
>> print "I used file: " + file
>>
>> the loop exits cleanly after 4311 lines in midline and the prints are
>> executed.
>> This happened on two different boxes runnig different brands of linux.
>> Is there something i miss or should be done differently?
>>
>>
> Please verify your indentation! What you posted above is invalid in
> many ways.
>
I am sorry, the indentation suffered from pasting.

This is the actual code:

source_file = bz2.BZ2File(file, "r")
for line in source_file:
print line.strip()

print "Exiting"
print "I used file: " + file



Steven D'Aprano

2/22/2010 8:02:00 AM

0

On Mon, 22 Feb 2010 07:49:51 +0100, Norman RieÃ? wrote:

> This is the actual code:
>
> source_file = bz2.BZ2File(file, "r")
> for line in source_file:
> print line.strip()
>
> print "Exiting"
> print "I used file: " + file


Have you verified that the bz file is good by opening it in another
application?



--
Steven

Norman Rieß

2/22/2010 8:43:00 AM

0

Am 02/22/10 09:02, schrieb Steven D'Aprano:
> On Mon, 22 Feb 2010 07:49:51 +0100, Norman RieÃ? wrote:
>
>
>> This is the actual code:
>>
>> source_file = bz2.BZ2File(file, "r")
>> for line in source_file:
>> print line.strip()
>>
>> print "Exiting"
>> print "I used file: " + file
>>
>
> Have you verified that the bz file is good by opening it in another
> application?
>
>
>
>

Yes, bzcat is running through the file fine. And piping bzcat output
into the python script reading stdin works fine, too.

Lie Ryan

2/22/2010 1:30:00 PM

0

On 02/22/10 19:43, Norman RieÃ? wrote:
> Am 02/22/10 09:02, schrieb Steven D'Aprano:
>> On Mon, 22 Feb 2010 07:49:51 +0100, Norman RieÃ? wrote:
>>
>>
>>> This is the actual code:
>>>
>>> source_file = bz2.BZ2File(file, "r")
>>> for line in source_file:
>>> print line.strip()
>>>
>>> print "Exiting"
>>> print "I used file: " + file
>>>
>>
>> Have you verified that the bz file is good by opening it in another
>> application?
>>
>>
>>
>>
>
> Yes, bzcat is running through the file fine. And piping bzcat output
> into the python script reading stdin works fine, too.

test with using something other than bzcat; bzcat does certain things
differently because of the way it works (a cat for bzipped file). Try
using plain "bunzip2 filename.bz2"

Norman Rieß

2/22/2010 3:38:00 PM

0

Am 02/22/10 14:29, schrieb Lie Ryan:
> On 02/22/10 19:43, Norman RieÃ? wrote:
>
>> Am 02/22/10 09:02, schrieb Steven D'Aprano:
>>
>>> On Mon, 22 Feb 2010 07:49:51 +0100, Norman RieÃ? wrote:
>>>
>>>
>>>
>>>> This is the actual code:
>>>>
>>>> source_file = bz2.BZ2File(file, "r")
>>>> for line in source_file:
>>>> print line.strip()
>>>>
>>>> print "Exiting"
>>>> print "I used file: " + file
>>>>
>>>>
>>> Have you verified that the bz file is good by opening it in another
>>> application?
>>>
>>>
>>>
>>>
>>>
>> Yes, bzcat is running through the file fine. And piping bzcat output
>> into the python script reading stdin works fine, too.
>>
> test with using something other than bzcat; bzcat does certain things
> differently because of the way it works (a cat for bzipped file). Try
> using plain "bunzip2 filename.bz2"
>

Did that too. Works as expected.

Stefan Behnel

2/22/2010 5:17:00 PM

0

Lie Ryan, 22.02.2010 14:29:
> On 02/22/10 19:43, Norman Rieß wrote:
>> Am 02/22/10 09:02, schrieb Steven D'Aprano:
>>> On Mon, 22 Feb 2010 07:49:51 +0100, Norman Rieß wrote:
>>>
>>>
>>>> This is the actual code:
>>>>
>>>> source_file = bz2.BZ2File(file, "r")
>>>> for line in source_file:
>>>> print line.strip()
>>>>
>>>> print "Exiting"
>>>> print "I used file: " + file
>>>>
>>> Have you verified that the bz file is good by opening it in another
>>> application?
>>>
>>>
>>>
>>>
>> Yes, bzcat is running through the file fine. And piping bzcat output
>> into the python script reading stdin works fine, too.
>
> test with using something other than bzcat; bzcat does certain things
> differently because of the way it works (a cat for bzipped file). Try
> using plain "bunzip2 filename.bz2"

Please note that all of this has already been suggested on the python-tutor
list.

Stefan