Asp Forum - Recursion limit of pickle?

Victor Lin

2/9/2008 11:50:00 AM

Hi,

I encounter a problem with pickle.
I download a html from:

http://www.amazon.com/Magellan-Mae...Widescreen-Navigator/dp/B000NMKHW6/ref=sr_1_2?ie=UTF8&s=electronics&qid=1202541889&...

and parse it with BeautifulSoup.
This page is very huge.
When I use pickle to dump it, a RuntimeError: maximum recursion depth
exceeded occur.
I think it is cause by this problem at first :

http://bugs.python.org/is...

But and then I do not think so, because I log recursion call of pickle
in file
I found that the recursion limit is exceeded in mid-way to expand
whole the BeautifulSoup object.
Not repeat to call some methods.

This is the code for test.

from BeautifulSoup import *

import pickle as pickle
import urllib

doc = urllib.urlopen('http://www.amazon.com/Magellan-Mae...
Widescreen-Navigator/dp/B000NMKHW6/ref=sr_1_2?
ie=UTF8&s=electronics&qid=1202541889&sr=1-2')

import sys
sys.setrecursionlimit(40000)

soup = BeautifulSoup(doc)
print pickle.dumps(soup)

-------------------
What I want to ask is: Is this cause by the limit of recursion limit
and stack size?

I had tired cPickle at first, and then I try pickle, cPickle just stop
running program without any message.
I think it is also implement with recursion way, and it also over flow
stack when dumping soup.

Are there any version of pickle that implement with no-recursion way?

Thanks.

Victor Lin.

4 Answers

Gabriel Genellina

2/10/2008 3:42:00 AM

En Sat, 09 Feb 2008 09:49:46 -0200, Victor Lin <Bornstub@gmail.com>
escribiï¿½:

> I encounter a problem with pickle.
> I download a html from:
>
> http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navigator/dp/B000NMKHW6/ref=sr_1_2?ie=UTF8&s=electronics&qid=1202541889&...
>
> and parse it with BeautifulSoup.
> This page is very huge.
> When I use pickle to dump it, a RuntimeError: maximum recursion depth
> exceeded occur.

BeautifulSoup objects usually aren't pickleable, independently of your
recursion error.

py> import pickle
py> import BeautifulSoup
py> soup = BeautifulSoup.BeautifulSoup("<html><body>Hello, world!</html>")
py> print pickle.dumps(soup)
Traceback (most recent call last):
....
TypeError: 'NoneType' object is not callable
py>

Why do you want to pickle it? Store the downloaded page instead, and
rebuild the BeautifulSoup object later when needed.

--
Gabriel Genellina

Victor Lin

2/10/2008 4:09:00 AM

On 2?10?, ??11?42?, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
> En Sat, 09 Feb 2008 09:49:46 -0200, Victor Lin <Borns...@gmail.com>
> escribi?:
>
> > I encounter a problem with pickle.
> > I download a html from:
>
> >http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navi......
>
> > and parse it with BeautifulSoup.
> > This page is very huge.
> > When I use pickle to dump it, a RuntimeError: maximum recursion depth
> > exceeded occur.
>
> BeautifulSoup objects usually aren't pickleable, independently of your
> recursion error.
But I pickle and unpickle other soup objects successfully.
Only this object seems too deep to pickle.
>
> py> import pickle
> py> import BeautifulSoup
> py> soup = BeautifulSoup.BeautifulSoup("<html><body>Hello, world!</html>")
> py> print pickle.dumps(soup)
> Traceback (most recent call last):
> ...
> TypeError: 'NoneType' object is not callable
> py>
>
> Why do you want to pickle it? Store the downloaded page instead, and
> rebuild the BeautifulSoup object later when needed.
>
> --
> Gabriel Genellina

Because parsing html cost a lots of cpu time. So I want to cache soup
object as file. If I have to get same page, I can get it from cache
file, even the parsed soup file. My program's bottleneck is on parsing
html, so if I can parse once and unpickle them later, it could save a
lots of time.

Gabriel Genellina

2/10/2008 8:57:00 AM

En Sun, 10 Feb 2008 02:09:12 -0200, Victor Lin <Bornstub@gmail.com>
escribiÃ³:

> On 2æ??10æ?¥, ä¸?å?11æ??42å??, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
> wrote:
>> En Sat, 09 Feb 2008 09:49:46 -0200, Victor Lin <Borns...@gmail.com>
>> escribiÃ³:
>>
>> > I encounter a problem with pickle.
>> > I download a html from:
>>
>> >http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navi......
>>
>> > and parse it with BeautifulSoup.
>> > This page is very huge.
>> > When I use pickle to dump it, a RuntimeError: maximum recursion depth
>> > exceeded occur.

Yes, I could reproduce the error. Worse, using cPicle instead of pickle,
Python just aborts (no exception trace, no error printed, no Application
Error popup...) (this is with Python 2.5.1 on Windows XP)

<code>
import urllib
import BeautifulSoup
import cPickle

doc =
urllib.urlopen('http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navi...000NMKHW6/ref=sr_1_2?ie=UTF8&s=electronics&qid=1202541889&sr=1-2')
soup = BeautifulSoup.BeautifulSoup(doc)
#print len(cPickle.dumps(soup,-1))
</code>

That page has an insane SELECT containing 1000 OPTIONs. Removing some of
them makes cPickle happy:

<code>
div=soup.find("div", id="buyboxDivId")
select=div.find("select", attrs={"name":"quantity"})
for i in range(200): # remove 200 options out of 1000
select.contents[5].extract()
print len(cPickle.dumps(soup,-1))
</code>

I don't know whether this is an error in BeautifulSoup or in pickle. That
SELECT with many OPTIONs is big, but not recursive (and I think that
BeautifulSoup uses weak references to build its links); anyway pickle is
supposed to handle recursion well. The longest chain of nested tags has
length=32; in principle I would expect that BS has a similar nesting
complexity, and the "recursion limit exceeded" error isn't expected.

>> BeautifulSoup objects usually aren't pickleable, independently of your
>> recursion error.
> But I pickle and unpickle other soup objects successfully.
> Only this object seems too deep to pickle.

Yes, sorry, I was using an older version of BeautifulSoup.

--
Gabriel Genellina

Gabriel Genellina

2/11/2008 7:13:00 AM

comp.lang.python

Recursion limit of pickle?

Victor Lin

Gabriel Genellina

Victor Lin

Gabriel Genellina

Gabriel Genellina

x Login to ForumsZone