Asp Forum - Pythonic indentation (or: beating a dead horse

J Haas

5/19/2009 9:35:00 PM

Greetings, folks. First time poster, so if I breach
any etiquette I sincerely apologize. I'm a bit of a Ruby Nuby who's
been
bouncing around between Python and Ruby, not entirely satisfied with
either
and wishing both were better. Two years ago I had no familiarity with
either
language, then I quit working for Microsoft and learned the true joy
of
programming in dynamic languages.

I am not a zealot and have little tolerance for zealotry, and I have
no
desire to get involved in holy wars. I'm a little apprehensive that
I'm
about to step into one, but here goes anyway. In general, I prefer
Ruby's
computational model to Python's. I think code blocks are cool, and I
love Ruby's
very flexible expressiveness. I dig the way every statement is an
expression,
and being able to return a value simply by stating it rather than
using the
'return' keyword. I hate Python's reliance on global methods like len
() and
filter() and map() (rather than making thesem methods members of the
classes
to which they apply) and I absolutely loathe its reliance on __magic__
method names. Ruby's ability to reopen and modify _any_ class kicks
ass, and
any Python fan who wants to deride "monkeypatching" can shove it. It
rocks.

That being said, I think monkeypatching could use some syntactic sugar
to
provide a cleaner way of referencing overridden methods, so instead
of:

module Kernel
alias oldprint print
def print(*args)
do_something
oldprint *(args + [" :-)"])
end
end

....maybe something like this:

module Kernel
override print(*args)
do_something
overridden *(args + [" :-)"])
end
end

But I digress... the purpose of this post is to talk about one of the
relatively
few areas where I think Python beats Ruby, and that's syntatically-
significant
indentation.

Before I get into it, let me say to those of you whose eyes are
rolling way
back in your skulls that I have a proposal that will allow you to keep
your
precious end keyword if you insist, and will be 100% backward
compatible with
your existing code. Skip down to "My proposal is" if you want to cut
to the
chase.

When I encounter engineers who don't know Python, I sometimes ask them
if they've heard anything about the language, and more often than not,
they
answer, "Whitespace is significant." And more often than not, they
think that's
about the dumbest idea ever ever. I used to think the same. Then I
learned
Python, and now I think that using indentation to define scope is
awesome.
I started looking over my code in C++ and realized that if some
nefarious
person took all of my code and stripped out the braces, I could easily
write a simple script in either Python or Ruby ;-) to restore them,
because
their locations would be completely unambiguous: open braces go right
before
the indentation level increases, close braces go right before it
decreases. And
having gotten used to this beautiful way of making code cleaner, I
hate that
Ruby doesn't have it.

I've read the two-year-old thread at
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...
(well, most of it, anyway) and I'll answer some of the objections
raised
in it, but first let me illustrate the scope of the problem with the
output
of a quick and dirty script I wrote:

> Examined 1433 files in /usr/lib/ruby/1.8.
> Total non-empty lines: 193458
> Lines consisting of NOTHING BUT THE WORD END: 31587 (a whopping 16.33%)
>
> Streaks:
> 7: 4
> 6: 37
> 5: 28
> 4: 159
> 3: 765
> 2: 4082
> 1: 16505

My friends, when ONE OUT OF EVERY SIX of your code lines consists of
just the
word "end", you have a problem with conciseness. I recognize that
syntactically-
significant indentation is not perfect, and it would bring a few pain
points
with it. But let me say that again: ONE OUT OF EVERY SIX LINES, for
crying out
loud! This should be intolerable to engineers who value elegance.
"Streaks"
means what you'd expect: there are four places in the scanned files
that look
like this:

end
end
end
end
end
end
end

This is *not* DRY. Or anything remotely resembling it. This is an
ugly blemidh on a language that otherwise is very beautiful. The
problem of
endless ends is exacerbated by Ruby's expressiveness, which lends
itself to very short methods, which can make defs and ends take up a
large amount of space relative to lines of code that actually do
something.

Even if you can find some ways in which the explicit "end" keyword is
preferable
to letting indentation do the talking... one out of every six lines.

Matz's objections in the cited thread were:

* tab/space mixture

Well, tough. Programmers shouldn't be using freakin' tabs anyway, and
if they
are, they _definitely_ shouldn't be mixing them with spaces. I don't
think
it's worthwhile to inflate the code base by a staggering 20% to
accommodate
people who want to write ugly code, mixing tabs and spaces when
there's no
reason to. And if for some reason it's really, really critical to
support this
use case, there could be some kernel-level method for specifying how
many
spaces a tab equates to, so the interpreter can figure out just how
far indented
that line with the tabs is.

* templates, e.g. eRuby

Not having used eRuby and therefore not being very familiar with it, I
don't
want to comment on specifics other than to note that plenty of Python-
based
template systems manage to get by.

* expression with code chunk, e.g lambdas and blocks

I don't really see the problem. My blocks are generally indented
relative to
the context to which they're being passed, isn't that standard?

My proposal is to, first, not change a thing with respect to existing
syntax. Second, steal the : from Python and use it to signify a scope
that's marked by indentation:

while some_condition
# this scope will terminate with an 'end' statement
do_something
end

while some_condition:
# this scope will terminate when the indentation level decreases
to the
# level before it was entered
do_something

%w{foo bar baz}.each do |val|
print val
end

%w{foo bar baz}.each do |val|:
print val

A valid objection that was raised in the earlier thread was regarding
a
quick and easy and common debugging technique: throwing in print
statements

def do_something(a, b, c)
print a, b, c # for debugging purposes
a + b + c
end

def do_something(a, b, c):
print a, b, c # error! unexpected indentation level
a + b + c
end

We can get around this by saying that braces, wherever they may
appear,
always define a new scope nested within the current scope, regardless
of
indentation.

def do_something(a, b, c):
{ print a, b, c } # this works
a + b + c

Alternatively, perhaps a character that's not normally valid at the
start of a
line (maybe !) could tell the interpreter "treat this line as though
it were
indented to the level of the current scope":

def do_something(a, b, c):
!print a, b, c
a + b + c

Well, I think that's probably enough for my first post. Thank you for
your
time, and Matz, thanks for the language. Thoughts, anyone?

--J

233 Answers

Joel VanderWerf

5/19/2009 9:59:00 PM

J Haas wrote:
> My friends, when ONE OUT OF EVERY SIX of your code lines consists of
> just the

John McCain, is that you? ;)

> word "end", you have a problem with conciseness. I recognize that
> syntactically-

A line consisting of just /\s+end/ is of very low complexity, however.

> significant indentation is not perfect, and it would bring a few pain
> points
> with it. But let me say that again: ONE OUT OF EVERY SIX LINES, for
> crying out
> loud! This should be intolerable to engineers who value elegance.
> "Streaks"
> means what you'd expect: there are four places in the scanned files
> that look
> like this:
>
> end
> end
> end
> end
> end
> end
> end
>
> This is *not* DRY. Or anything remotely resembling it. This is an
> ugly blemidh on a language that otherwise is very beautiful.

It's a blemish all right, but not on the language.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Benjamin Kudria

5/19/2009 10:20:00 PM

On Tue, May 19, 2009 at 17:40, J Haas <Myrdred@gmail.com> wrote:
>
> Well, tough. Programmers shouldn't be using freakin' tabs anyway, and
> if they are, they _definitely_ shouldn't be mixing them with spaces. I don't
> think it's worthwhile to inflate the code base by a staggering 20% to
> accommodate people who want to write ugly code, mixing tabs and spaces when
> there's no reason to. And if for some reason it's really, really critical to
> support this use case, there could be some kernel-level method for specifying how
> many spaces a tab equates to, so the interpreter can figure out just how
> far indented that line with the tabs is.

Well, tough?

I like to use tabs to indent, and spaces to align. (a la
http://www.emacswiki.org/emacs/Intel... )

Don't like it? Well, tough.

All kidding aside, who cares? Write a preprocessor.

Benjamin Kudria
--
http://ben.... | Jabber: ben@kudria.net

J Haas

5/19/2009 10:20:00 PM

Well, I found one of those breaches of etiquette I was worried
about... apparently wrapping my lines at 80 characters was not a good
idea. Sigh.

On May 19, 2:59 pm, Joel VanderWerf <vj...@path.berkeley.edu> wrote:
> John McCain, is that you? ;)

C'mon, you know perfectly well that John McCain can't use a
computer. :P

> > word "end", you have a problem with conciseness. I recognize that
> > syntactically-
>
> A line consisting of just /\s+end/ is of very low complexity, however.

Takes up just as much vertical space on the screen as the most complex
line you'll ever see. And even so... very low complexity or not, it's
_unnecessary_, which means any degree of complexity above zero is bad.

> > This is *not* DRY. Or anything remotely resembling it. This is an
> > ugly blemidh on a language that otherwise is very beautiful.
>
> It's a blemish all right, but not on the language.

If not the language, then where? In the library code? Maybe those four
places where "end" is repeated seven consecutive times are poorly
engineered and could be refactored, but how about the nearly thousand
times "end" is repeated three or more times? Is every one of those the
result of poor engineering on the part of the library programmers, or
were at least some of them forced on the programmers by the language?

One statistic that I didn't print out from my script was that there
are an average of 135 lines of "end" per file. For a language that
prides itself on expressiveness and brevity, this is just plain silly.

--J

Tony Arcieri

5/19/2009 10:23:00 PM

[Note: parts of this message were removed to make it a legal post.]

Let me get right to the heart of the issue here. It really comes down to
this:

On Tue, May 19, 2009 at 3:40 PM, J Haas <Myrdred@gmail.com> wrote:

> I think code blocks are cool, and I love Ruby's very flexible
> expressiveness. I dig the way every statement is an expression

These are both incompatible with a Python-style indentation sensitive
syntax. You can have the Pythonic indent syntax or a purely expression
based grammar with multi-line blocks. You can't have both.

In Python, all indent blocks are statements. This is why Python can't have
multi-line lambdas using Python's indent rules: lambdas are only useful as
expressions, but all indent blocks in Python are statements. The same issue
carries over to blocks, as a good deal of the time you want a method which
takes a block to return a value (e.g. map, inject, filter, sort, grep)

There's quite an interesting interplay of design decisions to make Python's
indent-sensitive grammar work the way it does. Indent blocks in Python have
no terminator token, whereas every expression in a Ruby-like grammar must be
terminated with ";" or a newline. This works because Python's expressions
are a subset of its statements, so it can have different rules for
statements versus expressions.

Implicit line joining works in Python because the only syntactic
constructions which can exist surrounded in [...] (...) {...} tokens are
expressions, so you can't put an indent block inside of these. If you have
an indent-sensitive Ruby with implicit line joining, you limit the
expressiveness of what you can do inside any syntactic constructs enclosed
by these tokens.

If you want to have indent blocks in a purely expression-based grammar, you
need to use a syntax more like Haskell. I've seen a somewhat Python-looking
language called Logix which uses Haskell's indent rules. It was created by
Tom Locke, who has since gone on to author Hobo in Ruby, and for what it's
worth now says he prefers Ruby's syntax. Go figure.

P.S. I tried to make a Ruby-like language with an indentation-sensitive
syntax. These are the lessons I learned. I gave up and added an "end"
keyword.

--
Tony Arcieri
medioh.com

J Haas

5/19/2009 10:28:00 PM

On May 19, 3:20 pm, Benjamin Kudria <b...@kudria.net> wrote:
> Well, tough?

I could probably have found a more tactful way of putting this. Sorry.

> I like to use tabs to indent, and spaces to align. (a lahttp://www.emacswiki.org/emacs/Intel...)

This wouldn't be a problem, at least it's not a problem in Python and
needn't be a problem in Ruby. Having an unclosed paren, bracket, or
brace results in automatic line continuation and you can put whatever
combination of spaces and tabs you'd like on the next line. It'll be
logically considered part of the line before.

Also, I should add that mixing tabs and spaces would only be a problem
if you did something like this: (leading dots represent spaces)

.......while some_condition:
\t\tdo_something # interpreter can't tell indentation level here

You could freely mix tabs and spaces as long as they match up from the
start:

.......while some_condition:
.......\tdo_something # interpreter can tell that this indentation
level is "one more" than previous

> All kidding aside, who cares? Write a preprocessor.

Don't need to; it's already been done. But I'd rather see the language
improved.

--J

J Haas

5/19/2009 10:50:00 PM

On May 19, 3:23 pm, Tony Arcieri <t...@medioh.com> wrote:
> On Tue, May 19, 2009 at 3:40 PM, J Haas <Myrd...@gmail.com> wrote:
> > I think code blocks are cool, and I love Ruby's very flexible
> > expressiveness. I dig the way every statement is an expression
>
> These are both incompatible with a Python-style indentation sensitive
> syntax. You can have the Pythonic indent syntax or a purely expression
> based grammar with multi-line blocks. You can't have both.

I'm having a hard time following why. Can you provide an example of a
Ruby snippet that couldn't be done with scoping defined by
indentation?

> In Python, all indent blocks are statements. This is why Python can't have
> multi-line lambdas using Python's indent rules: lambdas are only useful as
> expressions, but all indent blocks in Python are statements.

This seems like a problem with Python, not a problem with indentation.

> The same issue
> carries over to blocks, as a good deal of the time you want a method which
> takes a block to return a value (e.g. map, inject, filter, sort, grep)

Again, I would really like to see an example of the sort of thing
you'd want to do here that simply requires "end" to work.

> There's quite an interesting interplay of design decisions to make Python's
> indent-sensitive grammar work the way it does. Indent blocks in Python have
> no terminator token, whereas every expression in a Ruby-like grammar must be
> terminated with ";" or a newline.

Well, every expression in a Ruby-like grammar must be terminated by a
token. What that token must be depends on the grammar. Why not
something like this? (and please forgive the highly unorthodox
pseudocode syntax)

parse_line_indent:
if indentation = previous_line_indentation: do_nothing
if indentation > previous_line_indentation:
push_indentation_to_indent_stack_and_enter_new_scope
if indentation < previous_line_indentation:
while indentation > top_of_indent_stack:
insert_backtab_token # here's your statement terminator
pop_top_of_indent_stack
if indentation != top_of_indent_stack: raise IndentationError

In other words, the parser treats an indentation level less than the
indentation level of the previous line as a statement-terminating
token.

> Implicit line joining works in Python because the only syntactic
> constructions which can exist surrounded in [...] (...) {...} tokens are
> expressions, so you can't put an indent block inside of these. If you have
> an indent-sensitive Ruby with implicit line joining, you limit the
> expressiveness of what you can do inside any syntactic constructs enclosed
> by these tokens.

This sorta makes sense but I'd really like to see a concrete example
of what you're talking about. It doesn't seem like this would be an
insurmountable difficulty but it's hard to say without the example.

> If you want to have indent blocks in a purely expression-based grammar, you
> need to use a syntax more like Haskell.

Being completely unfamiliar with Haskell (functional programming's
never been my strong suit) I can't really comment.

> P.S. I tried to make a Ruby-like language with an indentation-sensitive
> syntax. These are the lessons I learned. I gave up and added an "end"
> keyword.

I'll be glad to take the benefit of your practical experience, but at
the risk of seriously violating DRY, some sort of demonstration of
something that you can do with "end" but couldn't do with indentation
would be nice.

--J

Tony Arcieri

5/19/2009 11:41:00 PM

[Note: parts of this message were removed to make it a legal post.]

On Tue, May 19, 2009 at 4:50 PM, J Haas <Myrdred@gmail.com> wrote:

> I'm having a hard time following why. Can you provide an example of a
> Ruby snippet that couldn't be done with scoping defined by
> indentation?
>

A multi-line block returning a value, e.g.

foo = somemethod do |arg1, arg2, arg3|
x = do_something arg1
y = do_something_else x, arg2
and_something_else_again y, arg3
end

Or for that matter, a multi-line lambda:

foo = lambda do |arg1, arg2, arg3|
x = do_something arg1
y = do_something_else x, arg2
and_something_else_again y, arg3
end

I'm sure you're aware the "multi-line lambda" problem is somewhat infamous
in the Python world. Guido van Rossum himself has ruled it an "unsolvable
problem" because of the statement-based nature of Python indent blocks.
Lambdas must be expressions or they are worthless, and there is no way to
embed an indent block inside of a Python expression.

And a bit of supplemental information: I conducted a poll of what Rubyists'
favorite features are in the language. Blocks were #1 by a wide margin.

This seems like a problem with Python, not a problem with indentation.
>

As I said, a Haskell-like syntax would facilitate including indent blocks in
a purely expression-based grammar. It's Python's statement-structured
syntax that's incompatible. However the sort of syntax you would get from a
Haskell-like approach is going to be different than Python's.

You can have a look at Logix, which is a purely expression based language
which tries to mimic Python's syntax while using Haskell-styled indent
rules. This is about the best you can do:

http://web.archive.org/web/20060517203300/www.livelogix.net/logix/tutorial/3-Introduction-For-Python-...

Well, every expression in a Ruby-like grammar must be terminated by a
> token. [... snip ...]

> In other words, the parser treats an indentation level less than the
> indentation level of the previous line as a statement-terminating
> token.
>

Because there are statements which contain multiple indent blocks, such as
if or try/catch. If you wanted to carry over Rubyisms, this would include
the case statement, e.g.

case foo
when bar
...
when baz
...

Therefore you can't just treat a "dedent" as a statement terminator, because
a single statement may itself contain multiple "dedent" tokens.

The best solution I could think of for this was a syntactically relevant
blank line, which sucks. It also requires lexer logic more complex than
Python to handle the case of a syntactically relevant newline, which in turn
pollutes the grammar.

> > Implicit line joining works in Python because the only syntactic
> > constructions which can exist surrounded in [...] (...) {...} tokens are
> > expressions, so you can't put an indent block inside of these. If you
> have
> > an indent-sensitive Ruby with implicit line joining, you limit the
> > expressiveness of what you can do inside any syntactic constructs
> enclosed
> > by these tokens.
>
> This sorta makes sense but I'd really like to see a concrete example
> of what you're talking about. It doesn't seem like this would be an
> insurmountable difficulty but it's hard to say without the example.
>

This is valid Ruby:

on_some_event(:something, :filter => proc do
something_here
another_thing_here
etc
end)

Implicit line joining removes any newline tokens inside of (...) [...] {...}
type syntactic constructions. So it becomes impossible to embed anything
with an indent block inside of expressions enclosed in any of these tokens.

And now we've hit an entirely new can of worms: how do you make implicit
line joining work when parens are optional?

--
Tony Arcieri
medioh.com

Eric Hodel

5/19/2009 11:49:00 PM

On May 19, 2009, at 15:25, J Haas wrote:
>>> This is *not* DRY. Or anything remotely resembling it. This is an
>>> ugly blemidh on a language that otherwise is very beautiful.
>>
>> It's a blemish all right, but not on the language.
>
> If not the language, then where? In the library code? Maybe those four
> places where "end" is repeated seven consecutive times are poorly
> engineered and could be refactored,

They almost certainly could be, this is a sign of strong code-smel

> but how about the nearly thousand
> times "end" is repeated three or more times? Is every one of those the
> result of poor engineering on the part of the library programmers, or
> were at least some of them forced on the programmers by the language?
>
> One statistic that I didn't print out from my script was that there
> are an average of 135 lines of "end" per file. For a language that
> prides itself on expressiveness and brevity, this is just plain silly.

Does anybody complain about terminating '}' in C, C++ or Java? Does
anybody complain about terminating '.' on sentences? (There's a
folowing capital letter for disambiguation!) I think we need to
remove all useles constructs from all languages

Benjamin Kudria

5/19/2009 11:59:00 PM

On Tue, May 19, 2009 at 19:49, Eric Hodel <drbrain@segment7.net> wrote:
> Does anybody complain about terminating '}' in C, C++ or Java?

Python programmers?

:-)

>=C2=A0Does anybody
> complain about terminating '.' on sentences? =C2=A0(There's a following c=
apital
> letter for disambiguation!)

I agree with your point, but I don't this argument helps - computer
languages and human languages are two fairly distinct classes, with
different origins, requirements, and, um...parsers.

In my opinion they aren't always comparable.

Ben Kudria
--=20
http://ben.... | Jabber: ben@kudria.net

Eric Hodel

5/20/2009 12:12:00 AM

On May 19, 2009, at 16:58, Benjamin Kudria wrote:

> On Tue, May 19, 2009 at 19:49, Eric Hodel <drbrain@segment7.net>
> wrote:
>> Does anybody complain about terminating '}' in C, C++ or Java?
>
> Python programmers?
>
> :-)
>
>> Does anybody
>> complain about terminating '.' on sentences? (There's a following
>> capital
>> letter for disambiguation!)
>
> I agree with your point, but I don't this argument helps - computer
> languages and human languages are two fairly distinct classes, with
> different origins, requirements, and, um...parsers.
>
> In my opinion they aren't always comparable.

I may have ben too sutle Maybe your email program spel-chex I
certainly didnt have useles double leters in my original

comp.lang.ruby

Pythonic indentation (or: beating a dead horse

J Haas

Joel VanderWerf

Benjamin Kudria

J Haas

Tony Arcieri

J Haas

J Haas

Tony Arcieri

Eric Hodel

Benjamin Kudria

Eric Hodel

x Login to ForumsZone