Asp Forum - Indentation and optional delimiters

Bearophile

2/26/2008 1:37:00 PM

This is the best praise of semantic indentation I have read so far, by
Chris Okasaki:
http://okasaki.blogspot.com/2008/02/in-praise-of-mandatory-indentatio...

A quotation:
>Imagine my surprise when I started teaching this language and found the students picking it up faster than any language I had ever taught before. As fond as I am of the language, I'm certainly under no illusions that it's the ultimate teaching language. After carefully watching the kinds of mistakes the students were and were not making, I gradually realized that the mandatory indentation was the key to why they were doing better.<

I have appreciated that article, and I have personally seen how fast
students learn Python basics compared to other languages, but I think
that it's way more than just indentation that makes the Python
language so quick to learn [see appendix].

I used to like indentation-based block delimiting years before finding
Python, and this article tells me that it may be a good thing for
other languages too, despite some disadvantages (but it's little
probable such languages will change, like the D language). Some people
have actually tried it in other languages:
http://people.csail.mit.edu/mikelin/...
So I may try to write something similar for another language too.

One of the most common complaints about it is this written by James on
that blog:
>I prefer explicit delimiters because otherwise the line wrapping of code by various email programs, web mail, mailing list digesters, newsgroup readers, etc., often results in code that no longer works.<

A possible solution to this problem is "optional delimiters". What's
the path of less resistance to implement such "optional delimiters"?
Is to use comments. For example: #} or #: or something similar.
If you use such pairs of symbols in a systematic way, you have cheap
"optional delimiters", for example:

def insort_right(a, x, lo=0, hi=None):
if hi is None:
hi = len(a)
#}
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
#}
else:
lo = mid+1
#}
#}
a.insert(lo, x)
#}

It looks a bit ugly, but a script is able to take such code even
flattened:

def insort_right(a, x, lo=0, hi=None):
if hi is None:
hi = len(a)
#}
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
#}
else:
lo = mid+1
#}
#}
a.insert(lo, x)
#}

And build the original Python code (it's possible to do the opposite
too, but it requires a bit more complex script). Such #} may even
become a "standard" (a convention. Not something enforced by the
compiler. What I'd like to see the Python compiler enforce is to raise
a syntax error if a module mixes tabs and spaces) for Python, so then
it's easy to find the re-indenter script when you find flattened code
in some email/web page, etc.

-------------------------------

Appendix:
I believe there can exist languages even faster than Python to learn
by novices. Python 3.0 makes Python even more tidy, but Python itself
isn't the most semantically clear language possible. I have seen that
the widespread reference semantics in Python is one of the things
newbies need more time to learn and understand. So it can be invented
a language (that may be slower than Python, but many tricks and a JIT
may help to reduce this problem) where

a = [1, 2, 3]
b = a
Makes b a copy-on-write copy of a, that is without reference
semantics.
Other things, like base-10 floating point numbers, and the removal of
other complexity allow to create a language more newbie-friendly. And
then I think I too can see myself using such simple to use but
practically useful language for very quick scripts, where running
speed is less important, but where most possible bugs are avoided
because the language semantics is rich but very clean. Is some one
else interested in such language?
Such language may even be a bit fitter than Python for an (uncommon)
practice called "real time programming" where an artist writes code
that synthesizes sounds and music on the fly ;-)

-------------------------------

Bye,
bearophile

18 Answers

Aaron Brady

2/26/2008 2:46:00 PM

On Feb 26, 7:36 am, bearophileH...@lycos.com wrote:
> This is the best praise of semantic indentation I have read so far, by
> Chris Okasaki:http://okasaki.blogspot.com/2008/02/in-praise-of-mandatory-......
>
> A quotation:
>
> >Imagine my surprise when I started teaching this language and found the students picking it up faster than any language I had ever taught before. As fond as I am of the language, I'm certainly under no illusions that it's the ultimate teaching language. After carefully watching the kinds of mistakes the students were and were not making, I gradually realized that the mandatory indentation was the key to why they were doing better.<
>
> I have appreciated that article, and I have personally seen how fast
> students learn Python basics compared to other languages, but I think
> that it's way more than just indentation that makes the Python
> language so quick to learn [see appendix].
>
> I used to like indentation-based block delimiting years before finding
> Python, and this article tells me that it may be a good thing for
> other languages too, despite some disadvantages (but it's little
> probable such languages will change, like the D language). Some people
> have actually tried it in other languages:http://people.csail.mit.edu/mikelin/...
> So I may try to write something similar for another language too.
>
> One of the most common complaints about it is this written by James on
> that blog:
>
> >I prefer explicit delimiters because otherwise the line wrapping of code by various email programs, web mail, mailing list digesters, newsgroup readers, etc., often results in code that no longer works.<
>
> A possible solution to this problem is "optional delimiters". What's
> the path of less resistance to implement such "optional delimiters"?
> Is to use comments. For example: #} or #: or something similar.
> If you use such pairs of symbols in a systematic way, you have cheap
> "optional delimiters", for example:
>
> def insort_right(a, x, lo=0, hi=None):
> if hi is None:
> hi = len(a)
> #}
> while lo < hi:
> mid = (lo + hi) // 2
> if x < a[mid]:
> hi = mid
> #}
> else:
> lo = mid+1
> #}
> #}
> a.insert(lo, x)
> #}
>
> It looks a bit ugly, but a script is able to take such code even
> flattened:
>
> def insort_right(a, x, lo=0, hi=None):
> if hi is None:
> hi = len(a)
> #}
> while lo < hi:
> mid = (lo + hi) // 2
> if x < a[mid]:
> hi = mid
> #}
> else:
> lo = mid+1
> #}
> #}
> a.insert(lo, x)
> #}
>
> And build the original Python code (it's possible to do the opposite
> too, but it requires a bit more complex script). Such #} may even
> become a "standard" (a convention. Not something enforced by the
> compiler. What I'd like to see the Python compiler enforce is to raise
> a syntax error if a module mixes tabs and spaces) for Python, so then
> it's easy to find the re-indenter script when you find flattened code
> in some email/web page, etc.
>
> -------------------------------
>
> Appendix:
> I believe there can exist languages even faster than Python to learn
> by novices. Python 3.0 makes Python even more tidy, but Python itself
> isn't the most semantically clear language possible. I have seen that
> the widespread reference semantics in Python is one of the things
> newbies need more time to learn and understand. So it can be invented
> a language (that may be slower than Python, but many tricks and a JIT
> may help to reduce this problem) where
>
> a = [1, 2, 3]
> b = a
> Makes b a copy-on-write copy of a, that is without reference
> semantics.

Why not b = copyonwrite( a )?

> Other things, like base-10 floating point numbers, and the removal of
> other complexity allow to create a language more newbie-friendly. And
> then I think I too can see myself using such simple to use but
> practically useful language for very quick scripts, where running
> speed is less important, but where most possible bugs are avoided
> because the language semantics is rich but very clean. Is some one
> else interested in such language?
> Such language may even be a bit fitter than Python for an (uncommon)
> practice called "real time programming" where an artist writes code
> that synthesizes sounds and music on the fly ;-)

Subclass the interpreter-- make your own session.

Bearophile

2/26/2008 3:46:00 PM

castiro...@gmail.com:
> Why not b = copyonwrite( a )?
> Subclass the interpreter-- make your own session.

Your idea may work, but I am talking about a new language (with some
small differences, not a revolution). Making such language efficient
enough may require to add some complex tricks, copy-on-write is just
one of them, a JIT is probably useful, etc.

Thank you, bye,
bearophile

Aaron Brady

2/26/2008 4:59:00 PM

On Feb 26, 9:45 am, bearophileH...@lycos.com wrote:
> castiro...@gmail.com:
>
> > Why not b = copyonwrite( a )?
> > Subclass the interpreter-- make your own session.
>
> Your idea may work, but I am talking about a new language (with some
> small differences, not a revolution). Making such language efficient
> enough may require to add some complex tricks, copy-on-write is just
> one of them, a JIT is probably useful, etc.
>
> Thank you, bye,
> bearophile

It's Unpythonic to compile a machine instruction out of a script. But
maybe in the right situations, with the right constraints on a
function, certain chunks could be native, almost like a mini-
compilation. How much machine instruction do you want to support?

Bearophile

2/26/2008 5:27:00 PM

castiro...@gmail.com:
> It's Unpythonic to compile a machine instruction out of a script. But
> maybe in the right situations, with the right constraints on a
> function, certain chunks could be native, almost like a mini-
> compilation. How much machine instruction do you want to support?

This language is meant for newbies, or for very quick scripts, or for
less bug-prone code, so optimizations are just a way to avoid such
programs run 5 times slower than Ruby ones ;-)

Bye,
bearophile

Aaron Brady

2/26/2008 6:13:00 PM

On Feb 26, 11:27 am, bearophileH...@lycos.com wrote:
> castiro...@gmail.com:
>
> > It's Unpythonic to compile a machine instruction out of a script. But
> > maybe in the right situations, with the right constraints on a
> > function, certain chunks could be native, almost like a mini-
> > compilation. How much machine instruction do you want to support?
>
> This language is meant for newbies, or for very quick scripts, or for
> less bug-prone code, so optimizations are just a way to avoid such
> programs run 5 times slower than Ruby ones ;-)
>
> Bye,
> bearophile

My first thought is to accept ambiguities, and then disambiguate them
at first compile. Whether you want to record the disambiguations in
the script itself ("do not modify -here-"-style), or an annotation
file, could be optional, and could be both. Queueing an example... .
You could lose a bunch of the parentheses too, oy.

"It looks like you mean, 'if "jackson" exists in namesmap', but there
is also a 'namesmap' folder in the current working directory. Enter
(1) for dictionary, (2) for file system."

[snip]
if 'jackson' in namesmap:
->
if 'jackson' in namesmap: #namesmap.__getitem__
[snip]

automatically.

And while you're at it, get us Starcrafters a command-line interface.

build 3 new barracks at last click location
produce at capacity 30% marines, 20% seige tanks, 10% medics
attack hotspot 9 in attack formation d

def d( army, enemy, terrain ):. ha?

Steven D'Aprano

2/26/2008 10:00:00 PM

On Tue, 26 Feb 2008 05:36:57 -0800, bearophileHUGS wrote:

> So it can be invented a language
> (that may be slower than Python, but many tricks and a JIT may help to
> reduce this problem) where
>
> a = [1, 2, 3]
> b = a
> Makes b a copy-on-write copy of a, that is without reference semantics.

Usability for beginners is a good thing, but not at the expense of
teaching them the right way to do things. Insisting on explicit requests
before copying data is a *good* thing. If it's a gotcha for newbies,
that's just a sign that newbies don't know the Right Way from the Wrong
Way yet. The solution is to teach them, not to compromise on the Wrong
Way. I don't want to write code where the following is possible:

a = [gigabytes of data]
b = a
f(a) # fast, no copying takes place
g(b) # also fast, no copying takes places
.... more code here
.... and pages later
b.append(1)
.... suddenly my code hits an unexpected performance drop
.... as gigabytes of data get duplicated

--
Steven

Wolfram Hinderer

2/26/2008 10:10:00 PM

On 26 Feb., 14:36, bearophileH...@lycos.com wrote:
> A possible solution to this problem is "optional delimiters". What's
> the path of less resistance to implement such "optional delimiters"?
> Is to use comments. For example: #} or #: or something similar.
> If you use such pairs of symbols in a systematic way, you have cheap
> "optional delimiters", for example:
>
> def insort_right(a, x, lo=0, hi=None):
> if hi is None:
> hi = len(a)
> #}
> while lo < hi:
> mid = (lo + hi) // 2
> if x < a[mid]:
> hi = mid
> #}
> else:
> lo = mid+1
> #}
> #}
> a.insert(lo, x)
> #}
>
> It looks a bit ugly, but a script is able to take such code even
> flattened:
>
> def insort_right(a, x, lo=0, hi=None):
> if hi is None:
> hi = len(a)
> #}
> while lo < hi:
> mid = (lo + hi) // 2
> if x < a[mid]:
> hi = mid
> #}
> else:
> lo = mid+1
> #}
> #}
> a.insert(lo, x)
> #}
>
> And build the original Python code (it's possible to do the opposite
> too, but it requires a bit more complex script).

Have a look at Tools/Scripts/pindent.py

--
Wolfram

Bearophile

2/26/2008 11:22:00 PM

Steven D'Aprano:
> Usability for beginners is a good thing, but not at the expense of
> teaching them the right way to do things. Insisting on explicit requests
> before copying data is a *good* thing. If it's a gotcha for newbies,
> that's just a sign that newbies don't know the Right Way from the Wrong
> Way yet. The solution is to teach them, not to compromise on the Wrong
> Way. I don't want to write code where the following is possible:
> ...
> ... suddenly my code hits an unexpected performance drop
> ... as gigabytes of data get duplicated

I understand your point of view, and I tend to agree.
But let me express my other point of view. Computer languages are a
way to ask a machine to do some job. As time passes, computers become
faster, and people find that it becomes possible to create languages
that are higher level, that is often more distant from how the CPU
actually performs the job, allowing the human to express the job in a
way closer to how less trained humans talk to each other and perform
jobs. Probably many years ago a language like Python was too much
costly in terms of CPU, making it of little use for most non-toy
purposes. But there's a need for higher level computer languages.
Today Ruby is a bit higher-level than Python (despite being rather
close). So my mostly alternative answers to your problem are:
1) The code goes slow if you try to perform that operation? It means
the JIT is "broken", and we have to find a smarter JIT (and the user
will look for a better language). A higher level language means that
the user is more free to ignore what's under the hood, the user just
cares that the machine will perform the job, regardless how, the user
focuses the mind on what job to do, the low level details regarding
how to do it are left to the machine. It's a job of the JIT writers to
allow the user to do such job anyway. So the JIT must be even smarter,
and for example it partitions the 1 GB of data in blocks, each one of
them managed with copy-on-write, so maybe it just copies few megabytes
or memory. Such language may need to be smart enough. Despite that I
think today lot of people that have a 3GHZ CPU that may accept to use
a language 5 times slower than Python, that for example uses base-10
floating point numbers (they are different from Python Decimal
numbers). Almost every day on the Python newsgroup a newbie asks if
the round() is broken seeing this:
>>> round(1/3.0, 2)
0.33000000000000002
A higher level language (like Mathematica) must be designed to give
more numerically correct answers, even if it may require more CPU. But
such language isn't just for newbies: if I write a 10 lines program
that has to print 100 lines of numbers I want it to reduce my coding
time, avoiding me to think about base-2 floating point numbers. If the
language use a higher-level numbers by default I can ignore that
problem, and my coding becomes faster, and the bugs decrease. The same
happens with Python integers: they don't overflow, so I may ignore lot
of details (like taking care of possible oveflows) that I have to
think about when I use the C language. C is faster, but such speed
isn't necessary if I need to just print 100 lines of output with a 3
GHz PC. What I need in such situation is a language that allows me to
ignore how numbers are represented by the CPU, and prints the correct
numbers on the file. This is just a silly example, but it may show my
point of view (another example is below).
2) You don't process gigabytes of data with this language, it's
designed to solve smaller problems with smaller datasets. If you want
to solve very big problems you have to use a lower level language,
like Python, or C, or assembly. Computers allow us to solve bigger and
bigger problems, but today the life is full of little problems too,
like processing a single 50-lines long text file.
3) You buy an even faster computer, where even copying 1 GB of data is
fast enough.

Wolfram:
>Have a look at Tools/Scripts/pindent.py

Oh, that's it, almost. Thank you.
Bye,
bearophile

-----------------------

Appendix:

Another example, this is a little problem from this page:
http://www.faqs.org/docs/abs/HTML/writingsc...

>Find the sum of all five-digit numbers (in the range 10000 - 99999) containing exactly two out of the following set of digits: { 4, 5, 6 }. These may repeat within the same number, and if so, they count once for each occurrence.<

I can solve it in 3.3 seconds on my old PC with Python like this:

print sum(n for n in xrange(10000, 100000) if len(set(str(n)) &
set("456")) == 2)

[Note: that's the second version of the code, the first version was
buggy because it contained:
.... & set([4, 5, 6])

So I have used the Python shell to see what set(str(12345))&set("456")
was, the result was an empty set. So it's a type bug. A statically
language like D often can't catch such bugs anyway, because chars are
seen as numbers.]

In Python I can write a low-level-style code like this that requires
only 0.4 seconds with Psyco (it's backported from the D version,
because it has allowed me to think at lower-level. I was NOT able to
reach such low level and high speed writing a progam just for Psyco):

def main():
digits = [0] * 10
tot = 0
for n in xrange(10000, 100000):
i = n
digits[4] = 0
digits[5] = 0
digits[6] = 0
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1; i /= 10
digits[i % 10] = 1
if (digits[4] + digits[5] + digits[6]) == 2:
tot += n
print tot
import psyco; psyco.bind(main)
main()

Or I can solve it in 0.07 seconds in D language (and about 0.05
seconds in very similar C code with -O3 -fomit-frame-pointer):

void main() {
int tot, d, i;
int[10] digits;
for (uint n = 10_000; n < 100_000; n++) {
digits[4] = 0;
digits[5] = 0;
digits[6] = 0;
i = n;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1; i /= 10;
digits[i % 10] = 1;
if ((digits[4] + digits[5] + digits[6]) == 2)
tot += n;
}
printf("%d\n", tot);
}

Assembly may suggest a bit lower level ways to solve the same problem
(using an instruction to compute div and mod at the same time, that
can go in EAX and EDX?), etc.

But if I just need to solve that "little" problem once, I may want to
reduce the sum of programming time + running time, so the in such
situation the first Python version wins (despite the quickly fixed
bug). That's why today people often use Python instead of C for small
problems. Similar things can be said about a possible language that is
a little higher level than Python.

Bye,
bearophile

Aaron Brady

2/27/2008 1:43:00 AM

On Feb 26, 5:22 pm, bearophileH...@lycos.com wrote:
> Steven D'Aprano:
>
> > Usability for beginners is a good thing, but not at the expense of
> > teaching them the right way to do things. Insisting on explicit requests
> > before copying data is a *good* thing. If it's a gotcha for newbies,
> > that's just a sign that newbies don't know the Right Way from the Wrong
> > Way yet. The solution is to teach them, not to compromise on the Wrong
> > Way. I don't want to write code where the following is possible:
> > ...
> > ... suddenly my code hits an unexpected performance drop
> > ... as gigabytes of data get duplicated
>
> I understand your point of view, and I tend to agree.
> But let me express my other point of view. Computer languages are a
> way to ask a machine to do some job. As time passes, computers become
> faster, and people find that it becomes possible to create languages
> that are higher level, that is often more distant from how the CPU
> actually performs the job, allowing the human to express the job in a
> way closer to how less trained humans talk to each other and perform
> jobs. Probably many years ago a language like Python was too much
> costly in terms of CPU, making it of little use for most non-toy
> purposes. But there's a need for higher level computer languages.
> Today Ruby is a bit higher-level than Python (despite being rather
> close). So my mostly alternative answers to your problem are:
> 1) The code goes slow if you try to perform that operation? It means
> the JIT is "broken", and we have to find a smarter JIT (and the user
> will look for a better language). A higher level language means that
> the user is more free to ignore what's under the hood, the user just
> cares that the machine will perform the job, regardless how, the user
> focuses the mind on what job to do, the low level details regarding
> how to do it are left to the machine. It's a job of the JIT writers to
> allow the user to do such job anyway. So the JIT must be even smarter,
> and for example it partitions the 1 GB of data in blocks, each one of
> them managed with copy-on-write, so maybe it just copies few megabytes
> or memory. Such language may need to be smart enough. Despite that I
> think today lot of people that have a 3GHZ CPU that may accept to use
> a language 5 times slower than Python, that for example uses base-10
> floating point numbers (they are different from Python Decimal
> numbers). Almost every day on the Python newsgroup a newbie asks if
> the round() is broken seeing this:>>> round(1/3.0, 2)
>
> 0.33000000000000002
> A higher level language (like Mathematica) must be designed to give
> more numerically correct answers, even if it may require more CPU. But
> such language isn't just for newbies: if I write a 10 lines program
> that has to print 100 lines of numbers I want it to reduce my coding
> time, avoiding me to think about base-2 floating point numbers. If the
> language use a higher-level numbers by default I can ignore that
> problem, and my coding becomes faster, and the bugs decrease. The same
> happens with Python integers: they don't overflow, so I may ignore lot
> of details (like taking care of possible oveflows) that I have to
> think about when I use the C language. C is faster, but such speed
> isn't necessary if I need to just print 100 lines of output with a 3
> GHz PC. What I need in such situation is a language that allows me to
> ignore how numbers are represented by the CPU, and prints the correct
> numbers on the file. This is just a silly example, but it may show my
> point of view (another example is below).
> 2) You don't process gigabytes of data with this language, it's
> designed to solve smaller problems with smaller datasets. If you want
> to solve very big problems you have to use a lower level language,
> like Python, or C, or assembly. Computers allow us to solve bigger and
> bigger problems, but today the life is full of little problems too,
> like processing a single 50-lines long text file.
> 3) You buy an even faster computer, where even copying 1 GB of data is
> fast enough.
>
> Wolfram:
>
> >Have a look at Tools/Scripts/pindent.py
>
> Oh, that's it, almost. Thank you.
> Bye,
> bearophile
>
> -----------------------
>
> Appendix:
>
> Another example, this is a little problem from this page:http://www.faqs.org/docs/abs/HTML/writingsc...
>
> >Find the sum of all five-digit numbers (in the range 10000 - 99999) containing exactly two out of the following set of digits: { 4, 5, 6 }. These may repeat within the same number, and if so, they count once for each occurrence.<
>
> I can solve it in 3.3 seconds on my old PC with Python like this:
>
> print sum(n for n in xrange(10000, 100000) if len(set(str(n)) &
> set("456")) == 2)
>
> [Note: that's the second version of the code, the first version was
> buggy because it contained:
> ... & set([4, 5, 6])
>
> So I have used the Python shell to see what set(str(12345))&set("456")
> was, the result was an empty set. So it's a type bug. A statically
> language like D often can't catch such bugs anyway, because chars are
> seen as numbers.]
>
> In Python I can write a low-level-style code like this that requires
> only 0.4 seconds with Psyco (it's backported from the D version,
> because it has allowed me to think at lower-level. I was NOT able to
> reach such low level and high speed writing a progam just for Psyco):
>
> def main():
> digits = [0] * 10
> tot = 0
> for n in xrange(10000, 100000):
> i = n
> digits[4] = 0
> digits[5] = 0
> digits[6] = 0
> digits[i % 10] = 1; i /= 10
> digits[i % 10] = 1; i /= 10
> digits[i % 10] = 1; i /= 10
> digits[i % 10] = 1; i /= 10
> digits[i % 10] = 1
> if (digits[4] + digits[5] + digits[6]) == 2:
> tot += n
> print tot
> import psyco; psyco.bind(main)
> main()
>
> Or I can solve it in 0.07 seconds in D language (and about 0.05
> seconds in very similar C code with -O3 -fomit-frame-pointer):
>
> void main() {
> int tot, d, i;
> int[10] digits;
> for (uint n = 10_000; n < 100_000; n++) {
> digits[4] = 0;
> digits[5] = 0;
> digits[6] = 0;
> i = n;
> digits[i % 10] = 1; i /= 10;
> digits[i % 10] = 1; i /= 10;
> digits[i % 10] = 1; i /= 10;
> digits[i % 10] = 1; i /= 10;
> digits[i % 10] = 1;
> if ((digits[4] + digits[5] + digits[6]) == 2)
> tot += n;
> }
> printf("%d\n", tot);
>
> }
>
> Assembly may suggest a bit lower level ways to solve the same problem
> (using an instruction to compute div and mod at the same time, that
> can go in EAX and EDX?), etc.
>
> But if I just need to solve that "little" problem once, I may want to
> reduce the sum of programming time + running time, so the in such
> situation the first Python version wins (despite the quickly fixed
> bug). That's why today people often use Python instead of C for small
> problems. Similar things can be said about a possible language that is
> a little higher level than Python.
>
> Bye,
> bearophile

You're looking at a few variables.
1) Time to code as a function of person / personal characteristic and
program
2) Time to run as a function of machine and program
3) Bugs (distinct bugs) as a function of person / personal
characteristic and program
3a) Bug's obviousness upon running ... ( person, program ) -- the
program screwed up, but person can't tell 'til later -- ( for program
with exactly one bug, or func. of ( person, program, bug ) )
3b) Bug's time to fix ( person, program [, bug ] )
3c) Bug incidence -- count of bugs the first time through ( person,
program )

(3) assumes you have experts and you're measuring number of bugs &c.
compared to a bug-free ideal in a lab. If no one knows if a program
(say, if it's large) has bugs, high values for (3a) might be
important.
(1)-(3) define different solutions to the same problem as different
programs, i.e. the program states its precise implementation, but then
the only thing that can vary data point to data point is variable
names, i.e. how precise the statement, and only to a degree: you might
get bugs in a memory manager even if you reorder certain ("ideally
reorderable") sequences of statements; and you might get variations if
you use paralell arrays vs. structures vs. paralell variable names.
Otherwise, you can specify an objective, deterministic, not
necessarily binary, metric of similarity and identity of programs.
Otherwise yet, a program maps input to output (+ residue, the
difference in machine state start to completion), so use descriptive
statistics (mean, variance, quartiles, outliers, extrema) on the
answers. E.g., for (2), the fastest C program (sampled) (that maps I-
>O) way surpasses the fastest Perl program (sampled), and it was
written by Steve C. Guru, and we couldn't find Steve Perl Guru; and
besides, the means across programs in C and Perl show no statistically
significant difference at the 96% confidence level. And besides,
there is no algorithm to generate even the fastest-running program (of
a problem/spec) for a machine in a language, much less (1) and (3)!
So you're looking at ( coder with coder trait or traitless, program
problem, program solution, language implementation, op'ing sys.,
hardware, inital state ) for variables in your answers. That's one of
the obstructions anyway to rigorous metrics of languages: you never
run the language. (Steve Traitless Coder-- v. interesting.-- given
nothing but the problem, the install and machine, and the internet--
throw doc. traits and internet connection speed in!-- how good is a
simple random sample?-- or Steve Self-Proclaimed Non-Zero Experience
and Familiarity Perl Coder, or Steve Self-Proclaimed Non-Trivial
Experience and Familiarity Perl Coder.)

And don't forget a bug identity metric too-- if two sprout up while
fixing one, is that one, two, or three? Do the answers to (1) and (2)
vary with count of bugs remaining? If a "program" maps input to
output, then Python has never been written.

That doesn't stop you from saying what you want though - what your
priorities are:
1) Time to code. Important.
2) Time to run. Unimportant.
3a) Bug obviousness. Important.
3b) Bug time to fix. Important.
3c) Bug incidence. Less important.

Ranked.
1) Time to code.
2) Bug obviousness. It's ok if Steve Proposed Language Guru rarely
codes ten lines without a bug, so long as he can always catch them
right away.
3) Bug time to fix.
4) Bug incidence.
unranked) Time to run.

Are you wanting an interpreter that runs an Amazon Cloud A.I. to catch
bugs? That's another $0.10, please, ma'am.

> b.append(1)
> ... suddenly my code hits an unexpected performance drop

Expect it, or use a different data structure.

Steven D'Aprano

2/28/2008 6:46:00 AM

By the way bearophile... the readability of your posts will increase a
LOT if you break it up into paragraphs, rather than use one or two giant
run-on paragraphs.

My comments follow.

On Tue, 26 Feb 2008 15:22:16 -0800, bearophileHUGS wrote:

> Steven D'Aprano:
>> Usability for beginners is a good thing, but not at the expense of
>> teaching them the right way to do things. Insisting on explicit
>> requests before copying data is a *good* thing. If it's a gotcha for
>> newbies, that's just a sign that newbies don't know the Right Way from
>> the Wrong Way yet. The solution is to teach them, not to compromise on
>> the Wrong Way. I don't want to write code where the following is
>> possible: ...
>> ... suddenly my code hits an unexpected performance drop ... as
>> gigabytes of data get duplicated
>
> I understand your point of view, and I tend to agree. But let me express
> my other point of view. Computer languages are a way to ask a machine to
> do some job. As time passes, computers become faster,

But never fast enough, because as they get faster, we demand more from
them.

> and people find
> that it becomes possible to create languages that are higher level, that
> is often more distant from how the CPU actually performs the job,
> allowing the human to express the job in a way closer to how less
> trained humans talk to each other and perform jobs.

Yes, but in practice, there is always a gap between what we say and what
we mean. The discipline of having to write down precisely what we mean is
not something that will ever go away -- all we can do is use "bigger"
concepts, and thus change the places where we have to be precise.

e.g. the difference between writing

index = 0
while index < len(seq):
do_something_with(seq[index])
index += 1

and

for x in seq:
do_something_with(x)

is that iterating over an object is, in some sense, a "bigger" concept
than merely indexing into an array. If seq happens to be an appropriately-
written tree structure, the same for-loop will work, while the while loop
probably won't.

> Probably many years
> ago a language like Python was too much costly in terms of CPU, making
> it of little use for most non-toy purposes. But there's a need for
> higher level computer languages. Today Ruby is a bit higher-level than
> Python (despite being rather close). So my mostly alternative answers to
> your problem are: 1) The code goes slow if you try to perform that
> operation? It means the JIT is "broken", and we have to find a smarter
> JIT (and the user will look for a better language).
[...]

Of course I expect that languages will continue to get smarter, but there
will always be a gap between "Do What I Say" and "Do What I Mean".

It may also turn out that, in the future, I won't care about Python4000
copying ten gigabytes of data unexpectedly, because copying 10GB will be
a trivial operation. But I will care about it copying 100 petabytes of
data unexpectedly, and complain that Python4000 is slower than G.

The thing is, make-another-copy and make-another-reference are
semantically different things: they mean something different. Expecting
the compiler to tell whether I want "x = y" to make a copy or to make
another reference is never going to work, not without running "import
telepathy" first. All you can do is shift the Gotcha! moment around.

You should read this article:

http://www.joelonsoftware.com/articles/fog00000...

It specifically talks about C, but it's relevant to Python, and all
hypothetical future languages. Think about string concatenation in Python.

> A higher level
> language means that the user is more free to ignore what's under the
> hood, the user just cares that the machine will perform the job,
> regardless how, the user focuses the mind on what job to do, the low
> level details regarding how to do it are left to the machine.

More free, yes. Completely free, no.

> Despite that I think today lot of people that have a 3GHZ CPU
> that may accept to use a language 5 times slower than Python, that for
> example uses base-10 floating point numbers (they are different from
> Python Decimal numbers). Almost every day on the Python newsgroup a
> newbie asks if the round() is broken seeing this:
>>>> round(1/3.0, 2)
> 0.33000000000000002
> A higher level language (like Mathematica) must be designed to give more
> numerically correct answers, even if it may require more CPU. But such
> language isn't just for newbies: if I write a 10 lines program that has
> to print 100 lines of numbers I want it to reduce my coding time,
> avoiding me to think about base-2 floating point numbers.

Sure. But all you're doing is moving the Gotcha around. Now newbies will
start asking why (2**0.5)**2 doesn't give 2 exactly when (2*0.5)*2 does.
And if you fix that by creating a surd data type, at more performance
cost, you'll create a different Gotcha somewhere else.

> If the
> language use a higher-level numbers by default I can ignore that
> problem,

But you can't. The problem only occurs somewhere else: Decimal is base
10, and there are base 10 numbers that can't be expressed exactly no
matter how many bits you use. They're different from the numbers you
can't express exactly in base 2 numbers, and different from the numbers
you can't express exactly as rationals, but they're there, waiting to
trip you up:

>>> from decimal import Decimal as d
>>> x = d(1)/d(3) # one third
>>> x
Decimal("0.3333333333333333333333333333")
>>> assert x*3 == d(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError

--
Steven

comp.lang.python

Indentation and optional delimiters

Bearophile

Aaron Brady

Bearophile

Aaron Brady

Bearophile

Aaron Brady

Steven D'Aprano

Wolfram Hinderer

Bearophile

Aaron Brady

Steven D'Aprano

x Login to ForumsZone