[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Python dos2unix one liner

Jerry Rocteur

2/27/2010 9:37:00 AM

Hi,

This morning I am working though Building Skills in Python and was
having problems with string.strip.

Then I found the input file I was using was in DOS format and I
thought it be best to convert it to UNIX and so I started to type perl
-i -pe 's/ and then I though, wait, I'm learning Python, I have to
think in Python, as I'm a Python newbie I fired up Google and typed:

+python convert dos to unix +one +liner

Found perl, sed, awk but no python on the first page

So I tried

+python dos2unix +one +liner -perl

Same thing..

But then I found http://wiki.python.org/moin/Powerful%20Python%20...
and tried this:

cat file.dos | python -c "import sys,re;
[sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
sys.stdin]" >file.unix

And it works..

[10:31:11 incc-imac-intel ~/python] cat -vet file.dos
one^M$
two^M$
three^M$
[10:32:10 incc-imac-intel ~/python] cat -vet file.unix
one$
two$
three$

But it is long and just like sed does not do it in place.

Is there a better way in Python or is this kind of thing best done in
Perl ?

Thanks,

Jerry
23 Answers

Martin P. Hellwig

2/27/2010 10:14:00 AM

0

On 02/27/10 09:36, @ Rocteur CC wrote:
<cut dos2unix oneliners;python vs perl/sed/awk>
Hi a couple of fragmented things popped in my head reading your
question, non of them is very constructive though in what you actually
want, but here it goes anyway.

- Oneline through away script with re as a built in syntax, yup that
sounds like perl to me.

- What is wrong with making an executable script (not being one line)
and call that, this is even shorter.

- ... wait a minute, you are building something in python (problem with
string.strip - why don't you use the built-in string strip method
instead?) which barfs on the input (win/unix line ending), should the
actual solution not be in there, i.e. parsing the line first to check
for line-endings? .. But wait another minute, why are you getting \r\n
in the first place, python by default uses universal new lines?

Hope that helps a bit, maybe you could post the part of the code what
you are doing for some better suggestions.

--
mph

Peter Otten

2/27/2010 11:05:00 AM

0

@ Rocteur CC wrote:

> But then I found
> http://wiki.python.org/moin/Powerful%20Python%20...
> and tried this:
>
> cat file.dos | python -c "import sys,re;
> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
> sys.stdin]" >file.unix
>
> And it works..

- Don't build list comprehensions just to throw them away, use a for-loop
instead.

- You can often use string methods instead of regular expressions. In this
case line.replace("\r\n", "\n").

> But it is long and just like sed does not do it in place.
>
> Is there a better way in Python or is this kind of thing best done in
> Perl ?

open(..., "U") ("universal" mode) converts arbitrary line endings to just
"\n"

$ cat -e file.dos
alpha^M$
beta^M$
gamma^M$

$ python -c'open("file.unix", "wb").writelines(open("file.dos", "U"))'

$ cat -e file.unix
alpha$
beta$
gamma$

But still, if you want very short (and often cryptic) code Perl is hard to
beat. I'd say that Python doesn't even try.

Peter

Steven D'Aprano

2/27/2010 11:45:00 AM

0

On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:

> cat file.dos | python -c "import sys,re;
> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
> sys.stdin]" >file.unix

Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
string replacement! You've been infected by too much Perl coding!

*wink*

Regexes are expensive, even in Perl, but more so in Python. When you
don't need the 30 pound sledgehammer of regexes, use lightweight string
methods.

import sys; sys.stdout.write(sys.stdin.read().replace('\r\n', '\n'))

ought to do it. It's not particularly short, but Python doesn't value
extreme brevity -- code golf isn't terribly exciting in Python.

[steve@sylar ~]$ cat -vet file.dos
one^M$
two^M$
three^M$
[steve@sylar ~]$ cat file.dos | python -c "import sys; sys.stdout.write
(sys.stdin.read().replace('\r\n', '\n'))" > file.unix
[steve@sylar ~]$ cat -vet file.unix
one$
two$
three$
[steve@sylar ~]$

Works fine. Unfortunately it still doesn't work in-place, although I
think that's probably a side-effect of the shell, not Python. To do it in
place, I would pass the file name:

# Tested and working in the interactive interpreter.
import sys
filename = sys.argv[1]
text = open(filename, 'rb').read().replace('\r\n', '\n')
open(filename, 'wb').write(text)


Turning that into a one-liner isn't terribly useful or interesting, but
here we go:

python -c "import sys;open(sys.argv[1], 'wb').write(open(sys.argv[1],
'rb').read().replace('\r\n', '\n'))" file

Unfortunately, this does NOT work: I suspect it is because the file gets
opened for writing (and hence emptied) before it gets opened for reading.
Here's another attempt:

python -c "import sys;t=open(sys.argv[1], 'rb').read().replace('\r\n',
'\n');open(sys.argv[1], 'wb').write(t)" file


[steve@sylar ~]$ cp file.dos file.txt
[steve@sylar ~]$ python -c "import sys;t=open(sys.argv[1], 'rb').read
().replace('\r\n', '\n');open(sys.argv[1], 'wb').write(t)" file.txt
[steve@sylar ~]$ cat -vet file.txt
one$
two$
three$
[steve@sylar ~]$


Success!

Of course, none of these one-liners are good practice. The best thing to
use is a dedicated utility, or write a proper script that has proper
error testing.


> Is there a better way in Python or is this kind of thing best done in
> Perl ?

If by "this kind of thing" you mean text processing, then no, Python is
perfectly capable of doing text processing. Regexes aren't as highly
optimized as in Perl, but they're more than good enough for when you
actually need a regex.

If you mean "code golf" and one-liners, then, yes, this is best done in
Perl :)


--
Steven

Jerry Rocteur

2/27/2010 3:02:00 PM

0


On 27 Feb 2010, at 12:44, Steven D'Aprano wrote:

> On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:
>
>> cat file.dos | python -c "import sys,re;
>> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
>> sys.stdin]" >file.unix
>
> Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
> string replacement! You've been infected by too much Perl coding!

Thanks for the replies I'm looking at them now, however, for those who
misunderstood, the above cat file.dos pipe pythong does not come from
Perl but comes from:

http://wiki.python.org/moin/Powerful%20Python%20...

> Apply regular expression to lines from stdin
> [another command] | python -c "import sys,re;
> [sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION', line))
> for line in sys.stdin]"


Nothing to do with Perl, Perl only takes a handful of characters to do
this and certainly does not require the creation an intermediate file,
I simply found the above example on wiki.python.org whilst searching
Google for a quick conversion solution.

Thanks again for the replies I've learned a few things and I
appreciate your help.

Jerry

ssteinerX@gmail.com

2/27/2010 3:55:00 PM

0


On Feb 27, 2010, at 10:01 AM, @ Rocteur CC wrote:
> Nothing to do with Perl, Perl only takes a handful of characters to do this and certainly does not require the creation an intermediate file

Perl may be better for you for throw-away code. Use Python for the code you want to keep (and read and understand later).

S

Dave \Crash\ Dummy

2/27/2010 4:59:00 PM

0

On 2010-02-27, @ Rocteur CC <macosx@rocteur.cc> wrote:

> Nothing to do with Perl, Perl only takes a handful of characters to do
> this and certainly does not require the creation an intermediate file,

Are you sure about that?

Or does it just hide the intermediate file from you the way
that sed -i does?

--
Grant

John Bokma

2/27/2010 5:27:00 PM

0

"ssteinerX@gmail.com" <ssteinerx@gmail.com> writes:

> On Feb 27, 2010, at 10:01 AM, @ Rocteur CC wrote:
>> Nothing to do with Perl, Perl only takes a handful of characters to
>> do this and certainly does not require the creation an intermediate
>> file
>
> Perl may be better for you for throw-away code. Use Python for the
> code you want to keep (and read and understand later).

Amusing how long those Python toes can be. In several replies I have
noticed (often clueless) opinions on Perl. When do people learn that a
language is just a tool to do a job?

--
John Bokma j3b

Hacking & Hiking in Mexico - http://john...
http://castle... - Perl & Python Development

Alf P. Steinbach

2/27/2010 5:40:00 PM

0

* @ Rocteur CC:
>
> On 27 Feb 2010, at 12:44, Steven D'Aprano wrote:
>
>> On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:
>>
>>> cat file.dos | python -c "import sys,re;
>>> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
>>> sys.stdin]" >file.unix
>>
>> Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
>> string replacement! You've been infected by too much Perl coding!
>
> Thanks for the replies I'm looking at them now, however, for those who
> misunderstood, the above cat file.dos pipe pythong does not come from
> Perl but comes from:
>
> http://wiki.python.org/moin/Powerful%20Python%20...

Steven is right with the "Holy Cow" and multiple exclamation marks.

For those unfamiliar with that, just google "multiple exclamation marks", I
think that should work... ;-)

Not only is a regular expression overkill & inefficient, but the snippet also
needlessly constructs an array with size the number of lines.

Consider instead e.g.

<hack>
import sys; sum(int(bool(sys.stdout.write(line.replace('\r\n','\n')))) for line
in sys.stdin)
</hack>

But better, consider that it's less work to save the code in a file than copying
and pasting it in a command interpreter, and then it doesn't need to be 1 line.



>> Apply regular expression to lines from stdin
>> [another command] | python -c "import
>> sys,re;[sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION',
>> line)) for line in sys.stdin]"
>
>
> Nothing to do with Perl, Perl only takes a handful of characters to do
> this and certainly does not require the creation an intermediate file, I
> simply found the above example on wiki.python.org whilst searching
> Google for a quick conversion solution.
>
> Thanks again for the replies I've learned a few things and I appreciate
> your help.

Cheers,

- Alf

Steven D'Aprano

2/27/2010 5:43:00 PM

0

On Sat, 27 Feb 2010 16:01:53 +0100, @ Rocteur CC wrote:

> On 27 Feb 2010, at 12:44, Steven D'Aprano wrote:
>
>> On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:
>>
>>> cat file.dos | python -c "import sys,re;
>>> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
>>> sys.stdin]" >file.unix
>>
>> Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal
>> string replacement! You've been infected by too much Perl coding!
>
> Thanks for the replies I'm looking at them now, however, for those who
> misunderstood, the above cat file.dos pipe pythong does not come from
> Perl but comes from:
>
> http://wiki.python.org/moin/Powerful%20Python%20...

Whether it comes from Larry Wall himself, or a Python wiki, using regexes
for a simple string replacement is like using an 80 lb sledgehammer to
crack a peanut.


>> Apply regular expression to lines from stdin [another command] | python
>> -c "import sys,re;
>> [sys.stdout.write(re.compile('PATTERN').sub('SUBSTITUTION', line)) for
>> line in sys.stdin]"

And if PATTERN is an actual regex, rather than just a simple substring,
that would be worthwhile. But if PATTERN is a literal string, then string
methods are much faster and use much less memory.

> Nothing to do with Perl, Perl only takes a handful of characters to do
> this

I'm sure it does. If I were interested in code-golf, I'd be impressed.


> and certainly does not require the creation an intermediate file,

The solution I gave you doesn't use an intermediate file either.

*slaps head and is enlightened*
Oh, I'm an idiot!

Since you're reading text files, there's no need to call
replace('\r\n','\n'). Since there shouldn't be any bare \r characters in
a DOS-style text file, just use replace('\r', '').

Of course, that's an unsafe assumption in the real world. But for a quick
and dirty one-liner (and all one-liners are quick and dirty), it should
be good enough.



--
Steven

ssteinerX@gmail.com

2/27/2010 6:01:00 PM

0


On Feb 27, 2010, at 12:27 PM, John Bokma wrote:

> "ssteinerX@gmail.com" <ssteinerx@gmail.com> writes:
>
>> On Feb 27, 2010, at 10:01 AM, @ Rocteur CC wrote:
>>> Nothing to do with Perl, Perl only takes a handful of characters to
>>> do this and certainly does not require the creation an intermediate
>>> file
>>
>> Perl may be better for you for throw-away code. Use Python for the
>> code you want to keep (and read and understand later).
>
> Amusing how long those Python toes can be. In several replies I have
> noticed (often clueless) opinions on Perl. When do people learn that a
> language is just a tool to do a job?

I'm not sure how "use it for what it's good for" has anything to do with toes.

I've written lots of both Python and Perl and sometimes, for one-off's, Perl is quicker; if you know it.

I sure don't want to maintain Perl applications though; even ones I've written.

When all you have is a nail file, everything looks like a toe; that doesn't mean you want to have to maintain it. Or something.

S