[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

string split without consumption

robert

2/2/2008 12:22:00 PM

this didn't work elegantly as expected:

>>> ss
'owi\nweoifj\nfheu\n'
>>> re.split(r'\A',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'\Z',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'$',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'(?s)$',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'(?m)(?s)$',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'(?m)$',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'(?m)\Z',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'(?m)\A',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'(?s)\A',ss)
['owi\nweoifj\nfheu\n']
>>> re.split(r'(?s)(?m)\A',ss)
['owi\nweoifj\nfheu\n']
>>>


how to do?


Robert
8 Answers

Tim Chase

2/2/2008 1:23:00 PM

0

> this didn't work elegantly as expected:
>
> >>> ss
> 'owi\nweoifj\nfheu\n'
> >>> re.split(r'(?m)$',ss)
> ['owi\nweoifj\nfheu\n']

Do you have a need to use a regexp?

>>> ss.splitlines(True)
['owi\n', 'weoifj\n', 'fheu\n']

-tkc




robert

2/2/2008 3:14:00 PM

0

Tim Chase wrote:
>> this didn't work elegantly as expected:
>>
>> >>> ss
>> 'owi\nweoifj\nfheu\n'
>> >>> re.split(r'(?m)$',ss)
>> ['owi\nweoifj\nfheu\n']
>
> Do you have a need to use a regexp?

I'd like the general case - split without consumption.

>
>>>> ss.splitlines(True)
> ['owi\n', 'weoifj\n', 'fheu\n']
>

thanks. Yet this does not work "naturally" consistent in my line
processing algorithm - the further buffering. Compare e.g.
ss.split('\n') ..

>>> 'owi\nweoifj\nfheu\n'.split('\n')
['owi', 'weoifj', 'fheu', '']
>>> 'owi\nweoifj\nfheu\nxx'.split('\n')
['owi', 'weoifj', 'fheu', 'xx']

is consistent in that regard: there is always a last empty or half
line, which can be fed readily as start to the further input
buffering.
With the .splitlines(True/False) results you need to fiddle, test
the last result's last char... Or you fail altogether with False.
So I'd call this a "wrong" implementation.


Robert

Tim Chase

2/2/2008 3:42:00 PM

0

>>> this didn't work elegantly as expected:
>>>
>>> >>> ss
>>> 'owi\nweoifj\nfheu\n'
>>> >>> re.split(r'(?m)$',ss)
>>> ['owi\nweoifj\nfheu\n']
>> Do you have a need to use a regexp?
>
> I'd like the general case - split without consumption.

I'm not sure there's a one-pass regex solution to the problem
using Python's regex engine. If pre-processing was allowed, one
could do it.

>>>>> ss.splitlines(True)
>> ['owi\n', 'weoifj\n', 'fheu\n']
>>
>
> thanks. Yet this does not work "naturally" consistent in my line
> processing algorithm - the further buffering. Compare e.g.
> ss.split('\n') ..

well, one can do

>>> [line + '\n' for line in ss.splitlines()]
['owi\n', 'eoifj\n', 'heu\n']
>>> [line + '\n' for line in (ss+'xxx').splitlines()]
['owi\n', 'eoifj\n', 'heu\n', 'xxx\n']

as another try for your edge case. It's understandable and
natural-looking

-tkc




Jeffrey Froman

2/2/2008 4:10:00 PM

0

robert wrote:

> thanks. Yet this does not work "naturally" consistent in my line
> processing algorithm - the further buffering. Compare e.g.
> ss.split('\n')  ..
>
> >>> 'owi\nweoifj\nfheu\n'.split('\n')
> ['owi', 'weoifj', 'fheu', '']
> >>> 'owi\nweoifj\nfheu\nxx'.split('\n')
> ['owi', 'weoifj', 'fheu', 'xx']


Maybe this works for you?

>>> re.split(r'(\n)', ss)
['owi', '\n', 'weoifj', '\n', 'fheu', '\n', '']


Jeffrey

robert

2/2/2008 4:16:00 PM

0

Tim Chase wrote:
>>>> this didn't work elegantly as expected:
>>>>
>>>> >>> ss
>>>> 'owi\nweoifj\nfheu\n'
>>>> >>> re.split(r'(?m)$',ss)
>>>> ['owi\nweoifj\nfheu\n']
>>> Do you have a need to use a regexp?
>> I'd like the general case - split without consumption.
>
> I'm not sure there's a one-pass regex solution to the problem
> using Python's regex engine. If pre-processing was allowed, one
> could do it.
>

I only found it partly with inverse logic - findall:

>>> re.findall(r'(?s).*?(?:\n|$)','owi\nweoifj\nfheu\nxx')
['owi\n', 'weoifj\n', 'fheu\n', 'xx', '']
>>> re.findall(r'(?s).*?(?:\n|$)','owi\nweoifj\nfheu\n')
['owi\n', 'weoifj\n', 'fheu\n', '']
>>>

but its also wrong regarding partial last lines.

re.split obviously doesn't understand \A \Z ^ $ and also \b etc.
empty matches.

>>> re.split(r'\b(?=\n)','owi\nweoifj\nfheu\n\nxx')
['owi\nweoifj\nfheu\n\nxx']


>>>>>> ss.splitlines(True)
>>> ['owi\n', 'weoifj\n', 'fheu\n']
>>>
>> thanks. Yet this does not work "naturally" consistent in my line
>> processing algorithm - the further buffering. Compare e.g.
>> ss.split('\n') ..
>
> well, one can do
>
> >>> [line + '\n' for line in ss.splitlines()]
> ['owi\n', 'eoifj\n', 'heu\n']
> >>> [line + '\n' for line in (ss+'xxx').splitlines()]
> ['owi\n', 'eoifj\n', 'heu\n', 'xxx\n']
>
> as another try for your edge case. It's understandable and
> natural-looking
>

nice for some display purposes, but "wrong" regarding a general
logic. The 'xxx' is not a complete line in the general case. Its
and (open) part and should appear so.


Robert

robert

2/2/2008 4:18:00 PM

0

Jeffrey Froman wrote:
> robert wrote:
>
>> thanks. Yet this does not work "naturally" consistent in my line
>> processing algorithm - the further buffering. Compare e.g.
>> ss.split('\n') ..
>>
>>>>> 'owi\nweoifj\nfheu\n'.split('\n')
>> ['owi', 'weoifj', 'fheu', '']
>>>>> 'owi\nweoifj\nfheu\nxx'.split('\n')
>> ['owi', 'weoifj', 'fheu', 'xx']
>
>
> Maybe this works for you?
>
>>>> re.split(r'(\n)', ss)
> ['owi', '\n', 'weoifj', '\n', 'fheu', '\n', '']
>

Thanks, thats it


Robert

Steve Holden

2/2/2008 4:33:00 PM

0

robert wrote:
[...]
> but its also wrong regarding partial last lines.
>
> re.split obviously doesn't understand \A \Z ^ $ and also \b etc.
> empty matches.
>
[...]
Or perhaps you don't understand re?

It's a tricky thing to start playing with. Look up re.MULTILINE ans
re.DOTALL.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.hold...

robert

2/2/2008 4:53:00 PM

0

Steve Holden wrote:
> robert wrote:
> [...]
>> but its also wrong regarding partial last lines.
>>
>> re.split obviously doesn't understand \A \Z ^ $ and also \b etc. empty
>> matches.
>>
> [...]
> Or perhaps you don't understand re?
>
> It's a tricky thing to start playing with. Look up re.MULTILINE ans
> re.DOTALL.
>

I tried "all" variations with (?m) (?s) also, and other things.
Yet it appeared to me so far, that its not the problem with the
line modes - none of empty matching works with split.

A basic case:
>>> re.split(r'\b','ab cwoe fds. fi foiewj')
['ab cwoe fds. fi foiewj']

He wants at least "something":

>>> re.split(r'\b.','ab cwoe fds. fi foiewj')
['', 'b', '', 'woe', '', 'ds', ' ', 'i', ' ', 'oiewj']


While .findXX and .searchXXX are "all-seeing" :

>>> re.findall(r'\b','ab cwoe fds. fi foiewj')
['', '', '', '', '', '', '', '', '', '']



Robert