Asp Forum - python dowload - comp.lang.python

monkeys paw

2/23/2010 7:42:00 PM

I used the following code to download a PDF file, but the
file was invalid after running the code, is there problem
with the write operation?

import urllib2
url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
a = open('adobe.pdf', 'w')
for line in urllib2.urlopen(url):
a.write(line)

11 Answers

John Bokma

2/23/2010 8:10:00 PM

monkeys paw <monkey@joemoney.net> writes:

> I used the following code to download a PDF file, but the
> file was invalid after running the code, is there problem
> with the write operation?
>
> import urllib2
> url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
> a = open('adobe.pdf', 'w')
> for line in urllib2.urlopen(url):
> a.write(line)

pdf is /not/ text. You're processing it like it's a text file (and
storing it like it's text, which on Windows is most likely a no no).

import urllib2

url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
response = urllib2.urlopen(url)
fh = open('adobe.pdf', 'wb')
fh.write(response.read())
fh.close()
response.close()

--
John Bokma j3b

Hacking & Hiking in Mexico - http://john...
http://castle... - Perl & Python Development

Tim Chase

2/23/2010 8:17:00 PM

monkeys paw wrote:
> I used the following code to download a PDF file, but the
> file was invalid after running the code, is there problem
> with the write operation?
>
> import urllib2
> url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
> a = open('adobe.pdf', 'w')

Sure you don't need this to be 'wb' instead of 'w'?

> for line in urllib2.urlopen(url):
> a.write(line)

I also don't know if this "for line...a.write(line)" loop is
doing newline translation. If it's a binary file, you should use
..read() (perhaps with a modest-sized block-size, writing it in a
loop if the file can end up being large.)

-tkc

Jerry Hill

2/23/2010 8:17:00 PM

On Tue, Feb 23, 2010 at 2:42 PM, monkeys paw <monkey@joemoney.net> wrote:
> I used the following code to download a PDF file, but the
> file was invalid after running the code, is there problem
> with the write operation?
>
> import urllib2
> url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
> a = open('adobe.pdf', 'w')
> for line in urllib2.urlopen(url):
> a.write(line)

Two guesses:

First, you need to call a.close() when you're done writing to the file.

This will happen automatically when you have no more references to the
file, but I'm guessing that you're running this code in IDLE or some
other IDE, and a is still a valid reference to the file after you run
that snippet.

Second, you're treating the pdf file as text (you're assuming it has
lines, you're not writing the file in binary mode, etc.). I don't
know if that's correct for a pdf file. I would do something like this
instead:

Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit
(Intel)] on win32
IDLE 2.6.4

>>> import urllib2
>>> url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
>>> a = open('C:/test.pdf', 'wb')
>>> data = urllib2.urlopen(url).read()
>>> a.write(data)
>>> a.close()

That seems to works for me, in that it downloads a 16 page pdf
document, and that document opens without error or any other obvious
problems.

--
Jerry

David Robinow

2/23/2010 8:21:00 PM

ssteinerX@gmail.com

2/23/2010 8:23:00 PM

On Feb 23, 2010, at 2:42 PM, monkeys paw wrote:

> I used the following code to download a PDF file, but the
> file was invalid after running the code, is there problem
> with the write operation?
>
> import urllib2
> url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
> a = open('adobe.pdf', 'w')

Try 'wb', just in case.

S

> for line in urllib2.urlopen(url):
> a.write(line)
> --
> http://mail.python.org/mailman/listinfo/p...

monkeys paw

2/23/2010 11:08:00 PM

On 2/23/2010 3:17 PM, Tim Chase wrote:
> monkeys paw wrote:
>> I used the following code to download a PDF file, but the
>> file was invalid after running the code, is there problem
>> with the write operation?
>>
>> import urllib2
>> url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
>> a = open('adobe.pdf', 'w')
>
> Sure you don't need this to be 'wb' instead of 'w'?

'wb' does the trick. Thanks all!

Here is the final working code, i used an index(i)
to see how many reads took place, i have to assume there is
a default buffer size:

import urllib2
a = open('adobe.pdf', 'wb')
i = 0
for line in
urllib2.urlopen('http://www.whirlpoolwaterheaters.com/downloads/651041...):
i = i + 1
a.write(line)

print "Number of reads: %d" % i
a.close()

NEW QUESTION if y'all are still reading:

Is there an integer increment operation in Python? I tried
using i++ but had to revert to 'i = i + 1'

>
>> for line in urllib2.urlopen(url):
>> a.write(line)
>
> I also don't know if this "for line...a.write(line)" loop is doing
> newline translation. If it's a binary file, you should use .read()
> (perhaps with a modest-sized block-size, writing it in a loop if the
> file can end up being large.)
>
> -tkc
>
>

Wes James

2/23/2010 11:20:00 PM

<snip>

>
>
> NEW QUESTION if y'all are still reading:
>
> Is there an integer increment operation in Python? I tried
> using i++ but had to revert to 'i = i + 1'

i+=1

<snip>

Ethan Furman

2/23/2010 11:34:00 PM

monkeys paw wrote:
> NEW QUESTION if y'all are still reading:
>
> Is there an integer increment operation in Python? I tried
> using i++ but had to revert to 'i = i + 1'

Nope, but try i += 1.

~Ethan~

Diez B. Roggisch

2/24/2010 9:40:00 PM

Am 24.02.10 00:08, schrieb monkeys paw:
> On 2/23/2010 3:17 PM, Tim Chase wrote:
>> monkeys paw wrote:
>>> I used the following code to download a PDF file, but the
>>> file was invalid after running the code, is there problem
>>> with the write operation?
>>>
>>> import urllib2
>>> url = 'http://www.whirlpoolwaterheaters.com/downloads/651041...
>>> a = open('adobe.pdf', 'w')
>>
>> Sure you don't need this to be 'wb' instead of 'w'?
>
> 'wb' does the trick. Thanks all!
>
> Here is the final working code, i used an index(i)
> to see how many reads took place, i have to assume there is
> a default buffer size:
>
> import urllib2
> a = open('adobe.pdf', 'wb')
> i = 0
> for line in
> urllib2.urlopen('http://www.whirlpoolwaterheaters.com/downloads/651041...):
>
> i = i + 1
> a.write(line)
>
> print "Number of reads: %d" % i
> a.close()
>
>
> NEW QUESTION if y'all are still reading:
>
> Is there an integer increment operation in Python? I tried
> using i++ but had to revert to 'i = i + 1'

Instead, use enumerate:

for i, line in enumerate(...):
...

Diez

aahz

2/28/2010 12:48:00 AM

In article <2fWdnXOfjat-whnWnZ2dnUVZ_rGdnZ2d@insightbb.com>,
monkeys paw <monkey@joemoney.net> wrote:
>On 2/23/2010 3:17 PM, Tim Chase wrote:
>>
>> Sure you don't need this to be 'wb' instead of 'w'?
>
>'wb' does the trick. Thanks all!
>
>import urllib2
>a = open('adobe.pdf', 'wb')
>i = 0
>for line in
>urllib2.urlopen('http://www.whirlpoolwaterheaters.com/downloads/651041...):
> i = i + 1
> a.write(line)

Using a for loop here is still a BAD IDEA -- line could easily end up
megabytes in size (though that is statistically unlikely).
--
Aahz (aahz@pythoncraft.com) <*> http://www.python...

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer

comp.lang.python

python dowload

monkeys paw

John Bokma

Tim Chase

Jerry Hill

David Robinow

ssteinerX@gmail.com

monkeys paw

Wes James

Ethan Furman

Diez B. Roggisch

aahz

x Login to ForumsZone