Gabriel Genellina
1/21/2008 6:42:00 PM
En Mon, 21 Jan 2008 14:12:43 -0200, Arne <arne.k.h@gmail.com> escribi�:
> I try to make a rss-reader in python just for fun, and I'm almost
> finished. I don't have any syntax-errors, but when i run my program,
> nothing happends.
>
> This program is supposed to download a .xml-file, save the contents in
> a buffer-file(buffer.txt) and parse the file looking for start-tags.
> When it has found a start tag, it asumes that the content (between the
> start-tag and the end-tag) is on the same line, so then it removes the
> start-tag and the end-tag and saves the content and put it into a
> database.
That's a gratuitous assumption and may not hold on many sources; you
should use a proper XML parser instead (using ElementTree, by example, is
even easier than your sequence of find and replace)
> The problem is that i cant find the data in the database! If i watch
> my program while im running it, i can see that it sucsessfuly
> downloads the .xml-file from the web and saves it in the buffer.
Ok. So the problem should be either when you read the buffer again, when
processing it, or when saving in the database.
It's very strange to create the table each time you want to save anything,
but this gives you another clue: the table is created and remains empty,
else the select statement in print_rss would have failed. So you know that
those lines are executed. Now, the print statement is your friend:
self.buffer = file('buffer.txt')
for line in self.buffer.readline():
print "line=",line # add this and see what you get
Once you get your code working, it's time to analyze it. I think someone
told you "in Python, you have to use self. everywhere" and you read it
literally. Let's see:
def update_buffer(self):
self.buffer = file('buffer.txt', 'w')
self.temp_buffer = urllib2.urlopen(self.rssurl).read()
self.buffer.write(self.temp_buffer)
self.buffer.close()
All those "self." are unneeded and wrong. You *can*, and *should*, use
local variables. Perhaps it's a bit hard to grasp at first, but local
variables, instance attributes and global variables are different things
used for different purposes. I'll try an example: you [an object] have a
diary, where you record things that you have to remember [your instance
attributes, or "data members" as they are called on other languages]. You
also carry a tiny notepad in your pocket, where you make a few notes when
you are doing something, but you always throw away the page once the job
is finished [local variables]. Your brothers, sisters and parents [other
objects] use the same schema, but there is a whiteboard on the kitchen
where important things that all of you have to know are recorded [global
variables] (anybody can read and write on the board).
Now, back to the code, why "self." everywhere? Let's see, self.buffer is a
file: opened, written, and closed, all inside the same function. Once it's
closed, there is no need to keep a reference to the file elsewhere. It's
discardable, as your notepad pages: use a local variable instead. In fact,
*all* your variables should be locals, the *only* things you should keep
inside your object are rssurl and the database location, and perhaps
temp_buffer (with another, more meaningful name, rssdata by example).
Other -more or less random- remarks:
if self.titleStored == True and self.linkStored == True and
descriptionStored == True:
Don't compare against True/False. Just use their boolean value:
if titleStored and linkStored and descriptionStored:
Your code resets those flags at *every* line read, and since a line
contains at most one tag, they will never be True at the same time. You
should reset the flags only after you got the three items and wrote them
onto the database.
The rss feed, after being read, is available into self.temp_buffer; why do
you read it again from the buffer file? If you want to iterate over the
individual lines, use:
for line in self.temp_buffer.splitlines():
--
Gabriel Genellina