[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Parse specific text in email body to CSV file

dpw.asdf

3/8/2008 10:21:00 PM

I have been searching all over for a solution to this. I am new to
Python, so I'm a little lost. Any pointers would be a great help. I
have a couple hundred emails that contain data I would like to
incorporate into a database or CSV file. I want to search the email
for specific text.

The emails basically look like this:



random text _important text:_15648 random text random text random text
random text
random text random text random text _important text:_15493 random text
random text
random text random text _important text:_11674 random text random text
random text
===============Date: Wednesday March 5, 2008================
name1: 15 name5: 14

name2: 18 name6: 105

name3: 64 name7: 2

name4: 24 name8: 13



I want information like "name1: 15" to be placed into the CSV with the
name "name1" and the value "15". The same goes for the date and
"_important text:_15493".

I would like to use this CSV or database to plot a graph with the
data.

Thanks!
2 Answers

Paul McGuire

3/9/2008 4:31:00 AM

0

On Mar 8, 4:20 pm, dpw.a...@gmail.com wrote:
> I have been searching all over for a solution to this. I am new to
> Python, so I'm a little lost. Any pointers would be a great help. I
> have a couple hundred emails that contain data I would like to
> incorporate into a database or CSV file. I want to search the email
> for specific text.
>
> The emails basically look like this:
>
> random text _important text:_15648 random text random text random text
> random text
> random text random text random text _important text:_15493 random text
> random text
> random text random text _important text:_11674 random text random text
> random text
> ===============Date: Wednesday March 5, 2008================
> name1: 15                name5: 14
>
> name2: 18                name6: 105
>
> name3: 64                name7: 2
>
> name4: 24                name8: 13
>
> I want information like "name1: 15" to be placed into the CSV with the
> name "name1" and the value "15". The same goes for the date and
> "_important text:_15493".
>
> I would like to use this CSV or database to plot a graph with the
> data.
>
> Thanks!

This kind of work can be done using pyparsing. Here is a starting
point for you:

from pyparsing import Word, oneOf, nums, Combine
import calendar

text = """
random text _important text:_15648 random text random text random
text
random text
random text random text random text _important text:_15493 random
text
random text
random text random text _important text:_11674 random text random
text
random text
===============Date: Wednesday March 5, 2008================
name1: 15 name5: 14

name2: 18 name6: 105

name3: 64 name7: 2

name4: 24 name8: 13
"""

integer = Word(nums)

IMPORTANT_TEXT = "_important text:_" + integer("value")
monthName = oneOf( list(calendar.month_name) )
dayName = oneOf( list(calendar.day_name) )
date = dayName("dayOfWeek") + monthName("month") + integer("day") + "," + integer("year")
DATE = Word("=").suppress() + "Date:" + date("date") +
Word("=").suppress()
NAMEDATA = Combine("name" + integer)("name") + ':' + integer("value")

for match in (IMPORTANT_TEXT | DATE | NAMEDATA).searchString(text):
print match.dump()

Prints:

['_important text:_', '15648']
- value: 15648
['_important text:_', '15493']
- value: 15493
['_important text:_', '11674']
- value: 11674
['Date:', 'Wednesday', 'March', '5', ',', '2008']
- date: ['Wednesday', 'March', '5', ',', '2008']
- day: 5
- dayOfWeek: Wednesday
- month: March
- year: 2008
- day: 5
- dayOfWeek: Wednesday
- month: March
- year: 2008
['name1', ':', '15']
- name: name1
- value: 15
['name5', ':', '14']
- name: name5
- value: 14
['name2', ':', '18']
- name: name2
- value: 18
['name6', ':', '105']
- name: name6
- value: 105
['name3', ':', '64']
- name: name3
- value: 64
['name7', ':', '2']
- name: name7
- value: 2
['name4', ':', '24']
- name: name4
- value: 24
['name8', ':', '13']
- name: name8
- value: 13

Find out more about pyparsing at http://pyparsing.wiki....

-- Paul

Miki

3/9/2008 6:26:00 PM

0

Hello,
>
I have been searching all over for a solution to this. I am new to
> Python, so I'm a little lost. Any pointers would be a great help. I
> have a couple hundred emails that contain data I would like to
> incorporate into a database or CSV file. I want to search the email
> for specific text.
>
> The emails basically look like this:
>
> random text _important text:_15648 random text random text random text
> random text
> random text random text random text _important text:_15493 random text
> random text
> random text random text _important text:_11674 random text random text
> random text
> ===============Date: Wednesday March 5, 2008================
> name1: 15                name5: 14
>
> name2: 18                name6: 105
>
> name3: 64                name7: 2
>
> name4: 24                name8: 13
>
> I want information like "name1: 15" to be placed into the CSV with the
> name "name1" and the value "15". The same goes for the date and
> "_important text:_15493".
>
> I would like to use this CSV or database to plot a graph with the
> data.
import re

for match in re.finditer("_([\w ]+):_(\d+)", text):
print match.groups()[0], match.groups()[1]

for match in re.finditer("Date: ([^=]+)=", text):
print match.groups()[0]

for match in re.finditer("(\w+): (\d+)", text):
print match.groups()[0], match.groups()[1]


Now you have two problems :)

HTH,
--
Miki <miki.tebeka@gmail.com>
http://pythonwise.bl...