Asp Forum - Download unnamed web image?

galileo228

2/17/2010 12:32:00 AM

All,

My python program signs onto the student facebook at my school and,
given email addresses, returns the associated full name. If I were to
do this through a regular browser, there is also a picture of the
individual, and I am trying to get my program to download the picture
as well. The problem: the html code of the page does not point to a
particular file, but rather refers to (what seems like) a query.

So, if one went to the facebook and searched for me using my school
net id (msb83), the image of my profile on the results page is:

<img width="100" height="130" border="0" class="border" alt="msb83"
src="deliverImage.cfm?netid=MSB83">

Using BeautifulSoup, mechanize, and urllib, I've constructed the
following:

br.open("http://www.school.edu/students/faceb...)
br.select_form(nr = 1)

br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
me
br.submit()
results = br.response().read()
soup = BeautifulSoup(results)
foo2 = soup.find('td', attrs={'width':'95'})
foo3 = foo2.find('a')
foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
# this just drills down to the <img> line and until this point the
program does not return an error

save_as = os.path.join('./', msb83 + '.jpg')
urllib.urlretrieve(foo4, save_as)

I get the following error msg after running this code:

AttributeError: 'NoneType' object has no attribute 'strip'

I can download the picture through my browser by right-clicking,
selecting save as, and then the image gets saved as
'deliverImage.cfm.jpeg.'

Are there any suggestions as to how I might be able to download the
image using python?

Please let me know if more information is needed -- happy to supply
it.

Matt

5 Answers

John Bokma

2/17/2010 1:48:00 AM

galileo228 <mattbarkan@gmail.com> writes:

> Using BeautifulSoup, mechanize, and urllib, I've constructed the
> following:
>
> br.open("http://www.school.edu/students/faceb...)
> br.select_form(nr = 1)
>
> br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
> me
> br.submit()
> results = br.response().read()
> soup = BeautifulSoup(results)
> foo2 = soup.find('td', attrs={'width':'95'})
> foo3 = foo2.find('a')
> foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
> # this just drills down to the <img> line and until this point the
> program does not return an error
>
> save_as = os.path.join('./', msb83 + '.jpg')
> urllib.urlretrieve(foo4, save_as)
>
> I get the following error msg after running this code:
>
> AttributeError: 'NoneType' object has no attribute 'strip'

Wild guess, since you didn't provide line numbers, etc.

foo4 is None

(I also would like to suggest to use more meaningful names)

--
John Bokma j3b

Hacking & Hiking in Mexico - http://john...
http://castle... - Perl & Python Development

galileo228

2/17/2010 2:40:00 AM

On Feb 16, 8:48 pm, John Bokma <j...@castleamber.com> wrote:
> galileo228 <mattbar...@gmail.com> writes:
> > Using BeautifulSoup, mechanize, and urllib, I've constructed the
> > following:
>
> > br.open("http://www.school.edu/students/faceb...)
> > br.select_form(nr = 1)
>
> > br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
> > me
> > br.submit()
> > results = br.response().read()
> > soup = BeautifulSoup(results)
> > foo2 = soup.find('td', attrs={'width':'95'})
> > foo3 = foo2.find('a')
> > foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
> > # this just drills down to the <img> line and until this point the
> > program does not return an error
>
> > save_as = os.path.join('./', msb83 + '.jpg')
> > urllib.urlretrieve(foo4, save_as)>

> > I get the following error msg after running this code:
>
> > AttributeError: 'NoneType' object has no attribute 'strip'
>
> Wild guess, since you didn't provide line numbers, etc.
>
> foo4 is None
>
> (I also would like to suggest to use more meaningful names)
>
> --
> John Bokma j3b

I thought it was too, and I just doublechecked. It's actually

foo3 = foo2.find('a')

that is causing the NoneType error.

Thoughts?

galileo228

2/17/2010 2:55:00 AM

On Feb 16, 9:40 pm, galileo228 <mattbar...@gmail.com> wrote:
> On Feb 16, 8:48 pm, John Bokma <j...@castleamber.com> wrote:
>
>
>
> > galileo228 <mattbar...@gmail.com> writes:
> > > Using BeautifulSoup, mechanize, and urllib, I've constructed the
> > > following:
>
> > > br.open("http://www.school.edu/students/faceb...)
> > > br.select_form(nr = 1)
>
> > > br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
> > > me
> > > br.submit()
> > > results = br.response().read()
> > > soup = BeautifulSoup(results)
> > > foo2 = soup.find('td', attrs={'width':'95'})
> > > foo3 = foo2.find('a')
> > > foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
> > > # this just drills down to the <img> line and until this point the
> > > program does not return an error
>
> > > save_as = os.path.join('./', msb83 + '.jpg')
> > > urllib.urlretrieve(foo4, save_as)>
> > > I get the following error msg after running this code:
>
> > > AttributeError: 'NoneType' object has no attribute 'strip'
>
> > Wild guess, since you didn't provide line numbers, etc.
>
> > foo4 is None
>
> > (I also would like to suggest to use more meaningful names)
>
> > --
> > John Bokma j3b
>
> I thought it was too, and I just doublechecked. It's actually
>
> foo3 = foo2.find('a')
>
> that is causing the NoneType error.
>
> Thoughts?

I've now fixed the foo3 issue, and I now know that the problem is with
the urllib.urlretrieve line (see above). This is the error msg I get
in IDLE:

Traceback (most recent call last):
File "/Users/Matt/Documents/python/dtest.py", line 59, in <module>
urllib.urlretrieve(foo4, save_as)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 226, in retrieve
url = unwrap(toBytes(url))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/urllib.py", line 1033, in unwrap
url = url.strip()
TypeError: 'NoneType' object is not callable

Is this msg being generated because I'm trying to retrieve a url
that's not really a file?

John Bokma

2/17/2010 3:11:00 AM

galileo228 <mattbarkan@gmail.com> writes:

> On Feb 16, 9:40Â pm, galileo228 <mattbar...@gmail.com> wrote:

[...]

> I've now fixed the foo3 issue, and I now know that the problem is with
> the urllib.urlretrieve line (see above). This is the error msg I get
> in IDLE:
>
> Traceback (most recent call last):
> File "/Users/Matt/Documents/python/dtest.py", line 59, in <module>
> urllib.urlretrieve(foo4, save_as)
> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
> python2.6/urllib.py", line 94, in urlretrieve
> return _urlopener.retrieve(url, filename, reporthook, data)
> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
> python2.6/urllib.py", line 226, in retrieve
> url = unwrap(toBytes(url))
> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
> python2.6/urllib.py", line 1033, in unwrap
> url = url.strip()
> TypeError: 'NoneType' object is not callable
>
> Is this msg being generated because I'm trying to retrieve a url
> that's not really a file?

--8<---------------cut here---------------start------------->8---
>>> import urllib;
>>> urllib.urlretrieve(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/urllib.py", line 89, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/usr/lib/python2.5/urllib.py", line 210, in retrieve
url = unwrap(toBytes(url))
File "/usr/lib/python2.5/urllib.py", line 1009, in unwrap
url = url.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
--8<---------------cut here---------------end--------------->8---

To me it looks like you're still calling urlretrieve with None as a
first value.

--
John Bokma j3b

Hacking & Hiking in Mexico - http://john...
http://castle... - Perl & Python Development

Matthew Barnett

2/17/2010 3:15:00 AM

galileo228 wrote:
> On Feb 16, 9:40 pm, galileo228 <mattbar...@gmail.com> wrote:
>> On Feb 16, 8:48 pm, John Bokma <j...@castleamber.com> wrote:
>>
>>
>>
>>> galileo228 <mattbar...@gmail.com> writes:
>>>> Using BeautifulSoup, mechanize, and urllib, I've constructed the
>>>> following:
>>>> br.open("http://www.school.edu/students/faceb...)
>>>> br.select_form(nr = 1)
>>>> br.form['fulltextsearch'] = 'msb83' # this searches the facebook for
>>>> me
>>>> br.submit()
>>>> results = br.response().read()
>>>> soup = BeautifulSoup(results)
>>>> foo2 = soup.find('td', attrs={'width':'95'})
>>>> foo3 = foo2.find('a')
>>>> foo4 = foo3.find('img', attrs={'src':'deliverImage.cfm?netid=msb83'})
>>>> # this just drills down to the <img> line and until this point the
>>>> program does not return an error
>>>> save_as = os.path.join('./', msb83 + '.jpg')
>>>> urllib.urlretrieve(foo4, save_as)>
>>>> I get the following error msg after running this code:
>>>> AttributeError: 'NoneType' object has no attribute 'strip'
>>> Wild guess, since you didn't provide line numbers, etc.
>>> foo4 is None
>>> (I also would like to suggest to use more meaningful names)
>>> --
>>> John Bokma j3b
>> I thought it was too, and I just doublechecked. It's actually
>>
>> foo3 = foo2.find('a')
>>
>> that is causing the NoneType error.
>>
>> Thoughts?
>
> I've now fixed the foo3 issue, and I now know that the problem is with
> the urllib.urlretrieve line (see above). This is the error msg I get
> in IDLE:
>
> Traceback (most recent call last):
> File "/Users/Matt/Documents/python/dtest.py", line 59, in <module>
> urllib.urlretrieve(foo4, save_as)
> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
> python2.6/urllib.py", line 94, in urlretrieve
> return _urlopener.retrieve(url, filename, reporthook, data)
> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
> python2.6/urllib.py", line 226, in retrieve
> url = unwrap(toBytes(url))
> File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
> python2.6/urllib.py", line 1033, in unwrap
> url = url.strip()
> TypeError: 'NoneType' object is not callable
>
> Is this msg being generated because I'm trying to retrieve a url
> that's not really a file?

It's because the URL you're passing in, namely foo4, is None. This is
presumably because foo3.find() returns None if it can't find the entry.

You checked the value of foo3, but did you check the value of foo4?

comp.lang.python

Download unnamed web image?

galileo228

John Bokma

galileo228

galileo228

John Bokma

Matthew Barnett

x Login to ForumsZone