[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Parsing Email Headers

T

3/11/2010 7:30:00 PM

All I'm looking to do is to download messages from a POP account and
retrieve the sender and subject from their headers. Right now I'm 95%
of the way there, except I can't seem to figure out how to *just* get
the headers. Problem is, certain email clients also include headers
in the message body (i.e. if you're replying to a message), and these
are all picked up as additional senders/subjects. So, I want to avoid
processing anything from the message body.

Here's a sample of what I have:

# For each line in message
for j in M.retr(i+1)[1]:
# Create email message object from returned string
emailMessage = email.message_from_string(j)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

I also tried using the following, but got the same results:
emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)

Any help would be appreciated!
6 Answers

MRAB

3/11/2010 8:14:00 PM

0

T wrote:
> All I'm looking to do is to download messages from a POP account and
> retrieve the sender and subject from their headers. Right now I'm 95%
> of the way there, except I can't seem to figure out how to *just* get
> the headers. Problem is, certain email clients also include headers
> in the message body (i.e. if you're replying to a message), and these
> are all picked up as additional senders/subjects. So, I want to avoid
> processing anything from the message body.
>
> Here's a sample of what I have:
>
> # For each line in message
> for j in M.retr(i+1)[1]:
> # Create email message object from returned string
> emailMessage = email.message_from_string(j)
> # Get fields
> fields = emailMessage.keys()
> # If email contains "From" field
> if emailMessage.has_key("From"):
> # Get contents of From field
> from_field = emailMessage.__getitem__("From")
>
> I also tried using the following, but got the same results:
> emailMessage =
> email.Parser.HeaderParser().parsestr(j, headersonly=True)
>
> Any help would be appreciated!

If you're using poplib then use ".top" instead of ".retr".

Dave \Crash\ Dummy

3/11/2010 8:21:00 PM

0

On 2010-03-11, T <misceverything@gmail.com> wrote:
> All I'm looking to do is to download messages from a POP account and
> retrieve the sender and subject from their headers. Right now I'm 95%
> of the way there, except I can't seem to figure out how to *just* get
> the headers.

The headers are saparated from the body by a blank line.

> Problem is, certain email clients also include headers in the message
> body (i.e. if you're replying to a message), and these are all picked
> up as additional senders/subjects. So, I want to avoid processing
> anything from the message body.

Then stop when you see a blank line.

Or retreive just the headers.

--
Grant Edwards grant.b.edwards Yow! My life is a patio
at of fun!
gmail.com

T

3/11/2010 10:44:00 PM

0

On Mar 11, 3:13 pm, MRAB <pyt...@mrabarnett.plus.com> wrote:
> T wrote:
> > All I'm looking to do is to download messages from a POP account and
> > retrieve the sender and subject from their headers.  Right now I'm 95%
> > of the way there, except I can't seem to figure out how to *just* get
> > the headers.  Problem is, certain email clients also include headers
> > in the message body (i.e. if you're replying to a message), and these
> > are all picked up as additional senders/subjects.  So, I want to avoid
> > processing anything from the message body.
>
> > Here's a sample of what I have:
>
> >                 # For each line in message
> >                 for j in M.retr(i+1)[1]:
> >                     # Create email message object from returned string
> >                     emailMessage = email.message_from_string(j)
> >                     # Get fields
> >                     fields = emailMessage.keys()
> >                     # If email contains "From" field
> >                     if emailMessage.has_key("From"):
> >                         # Get contents of From field
> >                         from_field = emailMessage.__getitem__("From")
>
> > I also tried using the following, but got the same results:
> >                  emailMessage =
> > email.Parser.HeaderParser().parsestr(j, headersonly=True)
>
> > Any help would be appreciated!
>
> If you're using poplib then use ".top" instead of ".retr".

I'm still having the same issue, even with .top. Am I missing
something?

for j in M.top(i+1, 0)[1]:
emailMessage = email.message_from_string(j)
#emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

Is there another way I should be using to retrieve only the headers
(not those in the body)?

MRAB

3/11/2010 11:07:00 PM

0

T wrote:
> On Mar 11, 3:13 pm, MRAB <pyt...@mrabarnett.plus.com> wrote:
>> T wrote:
>>> All I'm looking to do is to download messages from a POP account and
>>> retrieve the sender and subject from their headers. Right now I'm 95%
>>> of the way there, except I can't seem to figure out how to *just* get
>>> the headers. Problem is, certain email clients also include headers
>>> in the message body (i.e. if you're replying to a message), and these
>>> are all picked up as additional senders/subjects. So, I want to avoid
>>> processing anything from the message body.
>>> Here's a sample of what I have:
>>> # For each line in message
>>> for j in M.retr(i+1)[1]:
>>> # Create email message object from returned string
>>> emailMessage = email.message_from_string(j)
>>> # Get fields
>>> fields = emailMessage.keys()
>>> # If email contains "From" field
>>> if emailMessage.has_key("From"):
>>> # Get contents of From field
>>> from_field = emailMessage.__getitem__("From")
>>> I also tried using the following, but got the same results:
>>> emailMessage =
>>> email.Parser.HeaderParser().parsestr(j, headersonly=True)
>>> Any help would be appreciated!
>> If you're using poplib then use ".top" instead of ".retr".
>
> I'm still having the same issue, even with .top. Am I missing
> something?
>
> for j in M.top(i+1, 0)[1]:
> emailMessage = email.message_from_string(j)
> #emailMessage =
> email.Parser.HeaderParser().parsestr(j, headersonly=True)
> # Get fields
> fields = emailMessage.keys()
> # If email contains "From" field
> if emailMessage.has_key("From"):
> # Get contents of From field
> from_field = emailMessage.__getitem__("From")
>
> Is there another way I should be using to retrieve only the headers
> (not those in the body)?

The documentation does say:

"""unfortunately, TOP is poorly specified in the RFCs and is
frequently broken in off-brand servers."""

All I can say is that it works for me with my ISP! :-)

T

3/11/2010 11:20:00 PM

0

Thanks for your suggestions! Here's what seems to be working - it's
basically the same thing I originally had, but first checks to see if
the line is blank

response, lines, bytes = M.retr(i+1)
# For each line in message
for line in lines:
if not line.strip():
M.dele(i+1)
break

emailMessage = email.message_from_string(line)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

Thomas Guettler

3/12/2010 1:00:00 PM

0

T wrote:
> Thanks for your suggestions! Here's what seems to be working - it's
> basically the same thing I originally had, but first checks to see if
> the line is blank
>
> response, lines, bytes = M.retr(i+1)
> # For each line in message
> for line in lines:
> if not line.strip():
> M.dele(i+1)
> break
>
> emailMessage = email.message_from_string(line)
> # Get fields
> fields = emailMessage.keys()
> # If email contains "From" field
> if emailMessage.has_key("From"):
> # Get contents of From field
> from_field = emailMessage.__getitem__("From")

Hi T,

wait, this code looks strange.

You delete the email if it contains an empty line? I use something like this:

message='\n'.join(connection.retr(msg_num)[1])

Your code:
emailMessage = email.message_from_string(line)
create an email object from only *one* line!

You retrieve the whole message (you don't save bandwith), but maybe that's
what you want.


Thomas

--
Thomas Guettler, http://www.thomas-gu...
E-Mail: guettli (*) thomas-guettler + de