[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

next line (data parsing

robleachza

1/17/2008 12:55:00 AM

Hi there,
I'm struggling to find a sensible way to process a large chuck of
data--line by line, but also having the ability to move to subsequent
'next' lines within a for loop. I was hoping someone would be willing
to share some insights to help point me in the right direction. This
is not a file, so any file modules or methods available for files
parsing wouldn't apply.

I run a command on a remote host by using the pexpect (pxssh) module.
I get the result back which are pages and pages of pre-formatted text.
This is a pared down example (some will notice it's tivoli schedule
output).

....
Job Name Run Time
Pri Start Time Dependencies
Schedule HOST #ALL_LETTERS ( ) 00:01
10 22:00(01/16/08) LTR_CLEANUP

(SITE1 LTR_DB_LETTER 00:01
10
Total 00:01

Schedule HOST #DAILY ( ) 00:44 10
18:00(01/16/08) DAILY_LTR

(SITE3 RUN_LTR14_PROC 00:20
10
(SITE1 LTR14A_WRAPPER 00:06
10 SITE3#RUN_LTR14_PROC
(SITE1 LTR14B_WRAPPER 00:04
10 SITE1#LTR14A_WRAPPER
(SITE1 LTR14C_WRAPPER 00:03
10 SITE1#LTR14B_WRAPPER
(SITE1 LTR14D_WRAPPER 00:02
10 SITE1#LTR14C_WRAPPER
(SITE1 LTR14E_WRAPPER 00:01
10 SITE1#LTR14D_WRAPPER
(SITE1 LTR14F_WRAPPER 00:03
10 SITE1#LTR14E_WRAPPER
(SITE1 LTR14G_WRAPPER 00:03
10 SITE1#LTR14F_WRAPPER
(SITE1 LTR14H_WRAPPER 00:02
10 SITE1#LTR14G_WRAPPER
Total 00:44

Schedule HOST #CARDS ( ) 00:02 10
20:30(01/16/08) STR2_D

(SITE7 DAILY_MEETING_FILE 00:01
10
(SITE3 BEHAVE_HALT_FILE 00:01
10 SITE7#DAILY_HOME_FILE
Total 00:02
....

I can iterate over each line by setting a for loop on the data object;
no problem. But basically my intension is to locate the line "Schedule
HOST" and progressively move on to the 'next' line, parsing out the
pieces I care about, until I then hit "Total", then I resume to the
start of the for loop which locates the next "Schedule HOST".

I realize this is a really basic problem, but I can't seem to
articulate my intension well enough to find documentation or examples
that have been helpful to me. I bought the Python cookbook yesterday
which has gotten me a lot further in some areas, but still hasn't
given me what I'm looking for. This is just a pet project to help me
reduce some of the tedious aspects of my daily tasks, so I've been
using this as means to discover Python. I appreciate any insights that
would help set me in the right direction.

Cheers,
-Rob
52 Answers

Paul McGuire

1/17/2008 1:30:00 AM

0

On Jan 16, 6:54 pm, robleac...@gmail.com wrote:
> Hi there,
> I'm struggling to find a sensible way to process a large chuck of
> data--line by line, but also having the ability to move to subsequent
> 'next' lines within a for loop. I was hoping someone would be willing
> to share some insights to help point me in the right direction. This
> is not a file, so any file modules or methods available for files
> parsing wouldn't apply.
>
> I run a command on a remote host by using the pexpect (pxssh) module.
> I get the result back which are pages and pages of pre-formatted text.
> This is a pared down example (some will notice it's tivoli schedule
> output).
>

Pyparsing will work on a string or a file, and will do the line-by-
line iteration for you. You just have to define the expected format
of the data. The sample code below parses the data that you posted.
From this example, you can refine the code by assigning names to the
different parsed fields, and use the field names to access the parsed
values.

More info about pyparsing at http://pyparsing.wiki....

-- Paul



from pyparsing import *

integer = Word(nums)
timestamp = Combine(Word(nums,exact=2)+":"+Word(nums,exact=2))
dateString = Combine(Word(nums,exact=2)+"/"+
Word(nums,exact=2)+"/"+
Word(nums,exact=2))

schedHeader = Literal("Schedule HOST") + Word("#",alphas+"_") + "(" +
")" + timestamp + integer + timestamp+"("+dateString+")" + Optional(~LineEnd() + empty + restOfLine)
schedLine = Group(Word("(",alphanums) + Word(alphanums+"_") +
timestamp +
integer + Optional(~LineEnd() + empty + restOfLine)
) + LineEnd().suppress()
schedTotal = Literal("Total") + timestamp

sched = schedHeader + Group(OneOrMore(schedLine)) + schedTotal

from pprint import pprint
for s in sched.searchString(data):
pprint( s.asList() )
print


Prints:

['Schedule HOST',
'#ALL_LETTERS',
'(',
')',
'00:01',
'10',
'22:00',
'(',
'01/16/08',
')',
'LTR_CLEANUP ',
[['(SITE1', 'LTR_DB_LETTER', '00:01', '10']],
'Total',
'00:01']

['Schedule HOST',
'#DAILY',
'(',
')',
'00:44',
'10',
'18:00',
'(',
'01/16/08',
')',
'DAILY_LTR ',
[['(SITE3', 'RUN_LTR14_PROC', '00:20', '10'],
['(SITE1', 'LTR14A_WRAPPER', '00:06', '10', 'SITE3#RUN_LTR14_PROC
'],
['(SITE1', 'LTR14B_WRAPPER', '00:04', '10', 'SITE1#LTR14A_WRAPPER
'],
['(SITE1', 'LTR14C_WRAPPER', '00:03', '10', 'SITE1#LTR14B_WRAPPER
'],
['(SITE1', 'LTR14D_WRAPPER', '00:02', '10', 'SITE1#LTR14C_WRAPPER
'],
['(SITE1', 'LTR14E_WRAPPER', '00:01', '10', 'SITE1#LTR14D_WRAPPER
'],
['(SITE1', 'LTR14F_WRAPPER', '00:03', '10', 'SITE1#LTR14E_WRAPPER
'],
['(SITE1', 'LTR14G_WRAPPER', '00:03', '10', 'SITE1#LTR14F_WRAPPER
'],
['(SITE1', 'LTR14H_WRAPPER', '00:02', '10', 'SITE1#LTR14G_WRAPPER
']],
'Total',
'00:44']

['Schedule HOST',
'#CARDS',
'(',
')',
'00:02',
'10',
'20:30',
'(',
'01/16/08',
')',
'STR2_D ',
[['(SITE7', 'DAILY_MEETING_FILE', '00:01', '10'],
['(SITE3', 'BEHAVE_HALT_FILE', '00:01', '10', 'SITE7#DAILY_HOME_FILE
']],
'Total',
'00:02']

Scott David Daniels

1/17/2008 5:01:00 AM

0

robleachza@gmail.com wrote:
> Hi there,
> I'm struggling to find a sensible way to process a large chuck of
> data--line by line, but also having the ability to move to subsequent
> 'next' lines within a for loop. I was hoping someone would be willing
> to share some insights to help point me in the right direction. This
> is not a file, so any file modules or methods available for files
> parsing wouldn't apply.
>
> I can iterate over each line by setting a for loop on the data object;
> no problem. But basically my intension is to locate the line "Schedule
> HOST" and progressively move on to the 'next' line, parsing out the
> pieces I care about, until I then hit "Total", then I resume to the
> start of the for loop which locates the next "Schedule HOST".

if you can do:

for line in whatever:
...

then you can do:

source = iter(whatever)
for intro in source:
if intro.startswith('Schedule '):
for line in source:
if line.startswith('Total'):
break
process(intro, line)

--Scott David Daniels
Scott.Daniels@Acm.Org

George Sakkis

1/17/2008 5:42:00 AM

0

On Jan 17, 12:01 am, Scott David Daniels <Scott.Dani...@Acm.Org>
wrote:
> robleac...@gmail.com wrote:
> > Hi there,
> > I'm struggling to find a sensible way to process a large chuck of
> > data--line by line, but also having the ability to move to subsequent
> > 'next' lines within a for loop. I was hoping someone would be willing
> > to share some insights to help point me in the right direction. This
> > is not a file, so any file modules or methods available for files
> > parsing wouldn't apply.
>
> > I can iterate over each line by setting a for loop on the data object;
> > no problem. But basically my intension is to locate the line "Schedule
> > HOST" and progressively move on to the 'next' line, parsing out the
> > pieces I care about, until I then hit "Total", then I resume to the
> > start of the for loop which locates the next "Schedule HOST".
>
> if you can do:
>
> for line in whatever:
> ...
>
> then you can do:
>
> source = iter(whatever)
> for intro in source:
> if intro.startswith('Schedule '):
> for line in source:
> if line.startswith('Total'):
> break
> process(intro, line)
>
> --Scott David Daniels
> Scott.Dani...@Acm.Org

Or if you use this pattern often, you may extract it to a general
grouping function such as http://aspn.activestate.com/ASPN/Cookbook/Python/Rec...:

import re

for line in iterblocks(source,
start = lambda line:
line.startswith('Schedule HOST'),
end = lambda line: re.search(r'^
\s*Total',line),
skip_delim=False):
process(line)


George

George Sakkis

1/17/2008 5:51:00 AM

0

On Jan 17, 12:42 am, George Sakkis <george.sak...@gmail.com> wrote:
> On Jan 17, 12:01 am, Scott David Daniels <Scott.Dani...@Acm.Org>
> wrote:
>
>
>
> > robleac...@gmail.com wrote:
> > > Hi there,
> > > I'm struggling to find a sensible way to process a large chuck of
> > > data--line by line, but also having the ability to move to subsequent
> > > 'next' lines within a for loop. I was hoping someone would be willing
> > > to share some insights to help point me in the right direction. This
> > > is not a file, so any file modules or methods available for files
> > > parsing wouldn't apply.
>
> > > I can iterate over each line by setting a for loop on the data object;
> > > no problem. But basically my intension is to locate the line "Schedule
> > > HOST" and progressively move on to the 'next' line, parsing out the
> > > pieces I care about, until I then hit "Total", then I resume to the
> > > start of the for loop which locates the next "Schedule HOST".
>
> > if you can do:
>
> > for line in whatever:
> > ...
>
> > then you can do:
>
> > source = iter(whatever)
> > for intro in source:
> > if intro.startswith('Schedule '):
> > for line in source:
> > if line.startswith('Total'):
> > break
> > process(intro, line)
>
> > --Scott David Daniels
> > Scott.Dani...@Acm.Org
>
> Or if you use this pattern often, you may extract it to a general
> grouping function such ashttp://aspn.activestate.com/ASPN/Cookbook/Python/Rec...:

Sorry, google groups fscked up with the auto linewrapping (is there a
way to increase the line length?); here it is again:

import re

for line in iterblocks(source,
start = lambda line: line.startswith('Schedule HOST'),
end = lambda line: re.search(r'^\s*Total',line),
skip_delim = False):
process(line)


George

robleachza

1/17/2008 4:35:00 PM

0

I'm very appreciative for the comments posted. Thanks to each of you.
All good stuff.
Cheers,
-Rob


On Jan 16, 9:50 pm, George Sakkis <george.sak...@gmail.com> wrote:
> On Jan 17, 12:42 am, George Sakkis <george.sak...@gmail.com> wrote:
>
>
>
> > On Jan 17, 12:01 am, Scott David Daniels <Scott.Dani...@Acm.Org>
> > wrote:
>
> > > robleac...@gmail.com wrote:
> > > > Hi there,
> > > > I'm struggling to find a sensible way to process a large chuck of
> > > > data--line by line, but also having the ability to move to subsequent
> > > > 'next' lines within a for loop. I was hoping someone would be willing
> > > > to share some insights to help point me in the right direction. This
> > > > is not a file, so any file modules or methods available for files
> > > > parsing wouldn't apply.
>
> > > > I can iterate over each line by setting a for loop on the data object;
> > > > no problem. But basically my intension is to locate the line "Schedule
> > > > HOST" and progressively move on to the 'next' line, parsing out the
> > > > pieces I care about, until I then hit "Total", then I resume to the
> > > > start of the for loop which locates the next "Schedule HOST".
>
> > > if you can do:
>
> > > for line in whatever:
> > > ...
>
> > > then you can do:
>
> > > source = iter(whatever)
> > > for intro in source:
> > > if intro.startswith('Schedule '):
> > > for line in source:
> > > if line.startswith('Total'):
> > > break
> > > process(intro, line)
>
> > > --Scott David Daniels
> > > Scott.Dani...@Acm.Org
>
> > Or if you use this pattern often, you may extract it to a general
> > grouping function such ashttp://aspn.activestate.com/ASPN/Cookbook/Python/Rec...:
>
> Sorry, google groups fscked up with the auto linewrapping (is there a
> way to increase the line length?); here it is again:
>
> import re
>
> for line in iterblocks(source,
> start = lambda line: line.startswith('Schedule HOST'),
> end = lambda line: re.search(r'^\s*Total',line),
> skip_delim = False):
> process(line)
>
> George

Slogoin

5/23/2014 7:34:00 PM

0

On Friday, May 23, 2014 12:20:20 PM UTC-7, dsi1 wrote:
> My idea is that once people reach a sufficent technological
> level that allows them to kill themselves off, they do.

Yeah, your idea is a little old and was in the Fermi link.

"1966 Sagan and Shklovskii speculated that technological civilizations will either tend to destroy themselves within a century of developing interstellar communicative capability or master their self-destructive tendencies and survive for billion-year timescales.[55] Self-annihilation may also be viewed in terms of thermodynamics: insofar as life is an ordered system that can sustain itself against the tendency to disorder, the "external transmission" or interstellar communicative phase may be the point at which the system becomes unstable and self-destructs.[56]"

The one thing I would never have anticipated is that nobody looks anything up.

That's the breaks!

Slogoin

5/23/2014 7:40:00 PM

0

On Friday, May 23, 2014 12:34:20 PM UTC-7, Slogoin wrote:
>
> That's the breaks!

Oh, also in the article was this reason:

"They are too busy online"

Now we are talking!!!!!!

dsi1

5/23/2014 8:04:00 PM

0

On Friday, May 23, 2014 9:34:20 AM UTC-10, Slogoin wrote:
> On Friday, May 23, 2014 12:20:20 PM UTC-7, dsi1 wrote:
>
> > My idea is that once people reach a sufficent technological
>
> > level that allows them to kill themselves off, they do.
>
>
>
> Yeah, your idea is a little old and was in the Fermi link.
>
>
>
> "1966 Sagan and Shklovskii speculated that technological civilizations will either tend to destroy themselves within a century of developing interstellar communicative capability or master their self-destructive tendencies and survive for billion-year timescales.[55] Self-annihilation may also be viewed in terms of thermodynamics: insofar as life is an ordered system that can sustain itself against the tendency to disorder, the "external transmission" or interstellar communicative phase may be the point at which the system becomes unstable and self-destructs.[56]"
>
>

Well, we're so proud of ourselves, aren't we? I have no doubt that I'm not the first guy that thought of this. Thanks for going through the trouble of finding the link. I got news for you. Sagan and Shklovskii ain't the first ones to come up with the idea either. That's the breaks.

>
> The one thing I would never have anticipated is that nobody looks anything up.
>
>
>
> That's the breaks!

Tony Done

5/23/2014 8:41:00 PM

0

On 5/24/2014 5:20 AM, dsi1 wrote:
> On Friday, May 23, 2014 8:27:57 AM UTC-10, Tony Done wrote:
>> On 5/23/2014 9:18 AM, Andrew Schulman wrote:
>>
>>> http://www.huffingtonpost.com/2014/05/22/aliens-congress-seti-astronomers_n_53...
>>
>>>
>>
>>>
>>>
Andrew
>>
>>>
>>
>>
>>
>> This is an interesting background read:
>>
>>
>>
>> http://en.wikipedia.org/wiki/Fer...
>>
>
> A.C. Clarke said that the distances between planetary civilizations
> are too vast for them to freely interact. My idea is that once people
> reach a sufficent technological level that allows them to kill
> themselves off, they do.
>
> Speaking of people dying off, my dad told me that there's no bees
> around his place. He's right, there's no bees around here either.
> I've heard this happening on the mainland but it's kind of odd that
> this should be happening in one of the most geographically isolated
> places on earth. How are things in your town?
>
>>
>>
>> --
>>
>> Tony Done
>>
>>
>>
>> http://www.soundclick.com/bands/default.cfm?ban...
>>
>>
>>
>> http://www.flickr.com/photos/do...
>

I'm not sure if it in the Wiki article, I haven't read it for a long
time, but the idea has been touted that any advanced civilisation could
fill the galaxy up with self-replicating robots. There has been plenty
of time for this to happen, but there is no evidence. The counter
argument is that any other sufficiently advanced civilisation would
quickly develop self-replicating-robot wreckers to get rid of the
nuisance. There's a fairly recent (by my standards) SF novel where the
last survivors of the human lineage are, in fact, evolving
self-replcating robots.

--
Tony Done

http://www.soundclick.com/bands/default.cfm?ban...

http://www.flickr.com/photos/do...

dsi1

5/23/2014 9:57:00 PM

0

On Friday, May 23, 2014 10:41:12 AM UTC-10, Tony Done wrote:
> On 5/24/2014 5:20 AM, dsi1 wrote:
>
> > On Friday, May 23, 2014 8:27:57 AM UTC-10, Tony Done wrote:
>
> >> On 5/23/2014 9:18 AM, Andrew Schulman wrote:
>
> >>
>
> >>> http://www.huffingtonpost.com/2014/05/22/aliens-congress-seti-astronomers_n_53...
>
> >>
>
> >>>
>
> >>
>
> >>>
>
> >>>
>
> Andrew
>
> >>
>
> >>>
>
> >>
>
> >>
>
> >>
>
> >> This is an interesting background read:
>
> >>
>
> >>
>
> >>
>
> >> http://en.wikipedia.org/wiki/Fer...
>
> >>
>
> >
>
> > A.C. Clarke said that the distances between planetary civilizations
>
> > are too vast for them to freely interact. My idea is that once people
>
> > reach a sufficent technological level that allows them to kill
>
> > themselves off, they do.
>
> >
>
> > Speaking of people dying off, my dad told me that there's no bees
>
> > around his place. He's right, there's no bees around here either.
>
> > I've heard this happening on the mainland but it's kind of odd that
>
> > this should be happening in one of the most geographically isolated
>
> > places on earth. How are things in your town?
>
> >
>
> >>
>
> >>
>
> >> --
>
> >>
>
> >> Tony Done
>
> >>
>
> >>
>
> >>
>
> >> http://www.soundclick.com/bands/default.cfm?ban...
>
> >>
>
> >>
>
> >>
>
> >> http://www.flickr.com/photos/do...
>
> >
>
>
>
> I'm not sure if it in the Wiki article, I haven't read it for a long
>
> time, but the idea has been touted that any advanced civilisation could
>
> fill the galaxy up with self-replicating robots. There has been plenty
>
> of time for this to happen, but there is no evidence. The counter
>
> argument is that any other sufficiently advanced civilisation would
>
> quickly develop self-replicating-robot wreckers to get rid of the
>
> nuisance. There's a fairly recent (by my standards) SF novel where the
>
> last survivors of the human lineage are, in fact, evolving
>
> self-replcating robots.
>

Machines that are superior to humans seem to be hot in Hollywood these days.. I think it could go either way. It would be a lot easier to stop aging in humans and achieve immortality since the mechanisms and processes for self-replicating are already in place.

>
>
> --
>
> Tony Done
>
>
>
> http://www.soundclick.com/bands/default.cfm?ban...
>
>
>
> http://www.flickr.com/photos/do...