(Edward G. Nilges)
8/29/2009 9:51:00 AM
On Aug 29, 12:35 pm, bolega <gnuist...@gmail.com> wrote:
> This braggart admits that he had to put this code in TWO books and
> visit it twice to be explained. I am puting the excerpt from pp2-4 of
> this book and the C code. The C code will become indented and syntax
> highlighted once you paste in emacs etc. It is my belief and
> observation on a lot of problems by these so called "demi gods" that
> they are actually all average and no more intelligent. Its all that
> they got some opportunities to study some things at opportune time and
> facilities and also spent a lot of time in a productive environment
> and team.
>
> I know that lisp eval is written more clear than this recursion below
> because I am able to read it easily. and that code is almost self
> explanatory. C is more quirky. When you really mean recursively call
> another function, you are using return so you can have tail
> recursion ???? .
>
> Anyway, its your chance to show how clear C/C++/lisp/Scheme code you
> can write that is clearer. Also, i dont exclude pseudocode but it
> should be clear enough to be instantly translatable to a programming
> language. The real goal is to learn how to write or DERIVE recursion,
> how to enumerate cases, order them, and build recursion. You may even
> put some introductory tuturial and dont have to reply here in ascii
> but can upload a pdf at some link in rapidshare etc. Look how he put
> the code after problem statement and then has the need to explain it.
> Ideally, the discussion should be such that the student or reader
> himself jumps to the solution. That is why I give these unix school of
> pike/thomson/kernighan low grade in almost all their expositions
> except that they wrote earliest books to make millions of dollars in
> royalties and since now they are nobody due to linux, they are poorly
> regurgitating old material.
>
> Enjoy .............
>
> ============================
>
> The Practice of Programming
>
> In 1998, Rob Pike and I were writing The Practice of Programming
> (Addison-Wesley). The
> last chapter of the book, “Notation,” collected a number of examples
> where good notation
> led to better programs and better programming. This included the use
> of simple data specifications
> (printf, for instance), and the generation of code from tables.
> Because of our Unix backgrounds and nearly 30 years of experience with
> tools based on
> regular expression notation, we naturally wanted to include a
> discussion of regular
> expressions, and it seemed mandatory to include an implementation as
> well. Given our
> emphasis on tools, it also seemed best to focus on the class of
> regular expressions found in
> grep—rather than, say, those from shell wildcards—since we could also
> then talk about the
> design of grep itself.
> The problem was that any existing regular expression package was far
> too big. The local
> grep was over 500 lines long (about 10 book pages) and encrusted with
> barnacles. Open
> source regular expression packages tended to be huge—roughly the size
> of the entire
> book—because they were engineered for generality, flexibility, and
> speed; none were
> remotely suitable for pedagogy.
> I suggested to Rob that we find the smallest regular expression
> package that would illustrate
> the basic ideas while still recognizing a useful and nontrivial class
> of patterns. Ideally,
> the code would fit on a single page.
> Rob disappeared into his office. As I remember it now, he emerged in
> no more than an
> hour or two with the 30 lines of C code that subsequently appeared in
> Chapter 9 of The
> Practice of Programming. That code implements a regular expression
> matcher that handles
> the following constructs.
>
> Character Meaning
> c Matches any literal character c.
> . (period) Matches any single character.
> ^ Matches the beginning of the input string.
> $ Matches the end of the input string.
> * Matches zero or more occurrences of the previous character.
>
> This is quite a useful class; in my own experience of using regular
> expressions on a day-today
> basis, it easily accounts for 95 percent of all instances. In many
> situations, solving the
> right problem is a big step toward creating a beautiful program. Rob
> deserves great credit
> for choosing a very small yet important, well-defined, and extensible
> set of features from
> among a wide set of options.
> Rob’s implementation itself is a superb example of beautiful code:
> compact, elegant,
> efficient, and useful. It’s one of the best examples of recursion that
> I have ever seen, and it
> shows the power of C pointers. Although at the time we were most
> interested in conveying
> the important role of good notation in making a program easier to use
> (and perhaps
> easier to write as well), the regular expression code has also been an
> excellent way to
> illustrate algorithms, data structures, testing, performance
> enhancement, and other
> important topics.
>
> Implementation
> In The Practice of Programming, the regular expression matcher is part
> of a standalone program
> that mimics grep, but the regular expression code is completely
> separable from its
> surroundings. The main program is not interesting here; like many Unix
> tools, it reads
> either its standard input or a sequence of files, and prints those
> lines that contain a match
> of the regular expression.
> This is the matching code:
> /* match: search for regexp anywhere in text */
> int match(char *regexp, char *text)
> {
> if (regexp[0] == '^')
> return matchhere(regexp+1, text);
> do { /* must look even if string is empty */
> if (matchhere(regexp, text))
> return 1;} while (*text++ != '\0');
> return 0;
> }
>
> /* matchhere: search for regexp at beginning of text */
> int matchhere(char *regexp, char *text)
> {
> if (regexp[0] == '\0')
> return 1;
> if (regexp[1] == '*')
> return matchstar(regexp[0], regexp+2, text);
> Character Meaning
> c Matches any literal character c.
> . (period) Matches any single character.
> ^ Matches the beginning of the input string.
> $ Matches the end of the input string.
> * Matches zero or more occurrences of the previous character.
> 4 C H A P T E R O N E
> if (regexp[0] == '$' && regexp[1] == '\0')
> return *text == '\0';
> if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
> return matchhere(regexp+1, text+1);
> return 0;}
>
> /* matchstar: search for c*regexp at beginning of text */
> int matchstar(int c, char *regexp, char *text)
> {
> do { /* a * matches zero or more instances */
> if (matchhere(regexp, text))
> return 1;
>
>
>
> } while (*text != '\0' && (*text++ == c || c == '.'));
> return 0;
> }- Hide quoted text -
>
> - Show quoted text -
Many people seem to have been upset with this article in Beautiful
Code. I emailed Kernighan about it and received a reply (I'd met the
guy), but no real answer to the basic problem, which is that C cannot
express programming ideas clearly.
The first problem is that by being so "simple" it fails to implement a
recognizable "regular expression" processor.
I thought a second problem was that it used a value parameter as a
work area, which isn't something I would do in my languages of choice
(C Sharp and VB in recent years), but I found myself doing this in C
when I returned to the language mostly to kick the shit out it.
A third problem is praising a programmer for fast work. There's enough
slipshod work in our industry, and enough programmer overwork as it
is.
I think Brian Kernighan would be the first to admit that Lisp as
opposed to C made a more far-reaching contribution to computer science.