[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.c

program that checks a c source for syntax errors

Ceriousmall

7/12/2011 10:47:00 PM

feel free to review this code I look forward to the
comments............

/* 2011/07 Ceriousmall. . . .
* this program checks a C source file for rudimentary syntax errors
such as
* unbalanced parentheses, brackets and braces
* quotes both double and single
* escape sequences and comments
*/

#include <stdio.h>

#define OUT 0
#define IN 1
#define MAXLINE 1025 /* maximum input line size */

int at_start, char_const_state, quoted_string_state, comment_state;
int line_num, referance_num, parentheses_detected, brackets_detected;

/* assigns the character string to line[] */
int gotline(char line[], int max_line_length)
{
int ch, address;

for (address = 1; address < max_line_length && (ch = getchar()) !=
EOF && ch != '\n'; ++address)
line[address] = ch;

if (ch == '\n')
line[address++] = ch;

line[address] = '\0';

return ch;
}

/* check for basic syntax errors */
void syntaxcheck(char line[], int open_brace_address_map[], int
closed_brace_address_map[])
{
int address;

line_num += 1;

for (address = 1; line[address] != '\0'; ++address)
if (line[address] == '\'' && char_const_state == OUT &&
quoted_string_state == OUT && comment_state == OUT)
char_const_state = IN;

else if (line[address] == '\'' && line[address+1] == '\'' &&
char_const_state == IN) {
char_const_state = OUT;
++address;
}
else if (line[address] == '\'' && char_const_state == IN)
char_const_state = OUT;

else if (line[address] == '"' && char_const_state == OUT &&
quoted_string_state == OUT && comment_state == OUT)
quoted_string_state = IN;

else if (line[address] == '"' && quoted_string_state == IN)
quoted_string_state = OUT;

else if (line[address] == '/' && line[address+1] == '*' &&
quoted_string_state == OUT && comment_state == OUT) {
comment_state = IN;
++address;
}
else if (line[address] == '*' && line[address+1] == '/' &&
comment_state == IN) {
comment_state = OUT;
++address;
}
else if (line[address] == '(' && char_const_state == OUT &&
quoted_string_state == OUT && comment_state == OUT)
++parentheses_detected;

else if (line[address] == ')' && char_const_state == OUT &&
quoted_string_state == OUT && comment_state == OUT)
--parentheses_detected;

else if (line[address] == '[' && char_const_state == OUT &&
quoted_string_state == OUT && comment_state == OUT)
++brackets_detected;

else if (line[address] == ']' && char_const_state == OUT &&
quoted_string_state == OUT && comment_state == OUT)
--brackets_detected;

else if (line[address] == '{' && char_const_state == OUT &&
quoted_string_state == OUT && comment_state == OUT) {
open_brace_address_map[line_num] = address;
referance_num = line_num+1;
}
else if (line[address] == '}' && char_const_state == OUT &&
quoted_string_state == OUT && comment_state == OUT) {
closed_brace_address_map[line_num] = address;

while (open_brace_address_map[referance_num] == 0)
--referance_num;

if (open_brace_address_map[referance_num] > 0) {
open_brace_address_map[referance_num] = 0;
closed_brace_address_map[line_num] = 0;
}
}

if (at_start == EOF) {
if (char_const_state == IN)
printf("syntax error...... fragmented character constant.\n");

if (quoted_string_state == IN)
printf("syntax error...... character string missing closing
argument.\n");

if (comment_state == IN)
printf("syntax error...... expected '*/' token after identifier.
\n");

if (parentheses_detected != 0)
printf("syntax error...... unbalanced parentheses detected.\n");

if (brackets_detected != 0)
printf("syntax error...... unbalanced brackets detected.\n");

for (line_num = 1; line_num < MAXLINE; ++line_num) {
if (closed_brace_address_map[line_num] > 0) {
printf("syntax error......line(%d),", line_num);
printf(" col(%d), expected identifier before '}' token.\n",
closed_brace_address_map[line_num]);
}
if (open_brace_address_map[line_num] > 0) {
printf("syntax error......line(%d),", line_num);
printf(" col(%d), closing argument missing, expected '}'.
\n", open_brace_address_map[line_num]);
}
}
}
}

/* time to execute the entire construct */
int main(void)
{
char line[MAXLINE];
int open_brace_address_map[MAXLINE],
closed_brace_address_map[MAXLINE];

at_start = char_const_state = quoted_string_state = comment_state =
OUT;
referance_num = parentheses_detected = brackets_detected = 0;

for (line_num = 1; line_num < MAXLINE; ++line_num) {
open_brace_address_map[line_num] = 0;
closed_brace_address_map[line_num] = 0;
}
line_num = referance_num;
open_brace_address_map[line_num] = closed_brace_address_map[line_num]
= -1;

while (at_start != EOF) {
at_start = gotline(line, MAXLINE);
syntaxcheck(line, open_brace_address_map,
closed_brace_address_map);
}
return 0;
}
















4 Answers

Ben Bacarisse

7/13/2011 1:50:00 AM

0

Ceriousmall <divadsmall@gmail.com> writes:

> feel free to review this code I look forward to the
> comments............
>
> /* 2011/07 Ceriousmall. . . .
> * this program checks a C source file for rudimentary syntax errors
> such as
> * unbalanced parentheses, brackets and braces
> * quotes both double and single
> * escape sequences and comments
> */
>
> #include <stdio.h>
>
> #define OUT 0
> #define IN 1
> #define MAXLINE 1025 /* maximum input line size */

It's always more satisfying to avoid limits like this.

> int at_start, char_const_state, quoted_string_state, comment_state;
> int line_num, referance_num, parentheses_detected, brackets_detected;
>
> /* assigns the character string to line[] */
> int gotline(char line[], int max_line_length)
> {
> int ch, address;
>
> for (address = 1; address < max_line_length && (ch = getchar()) !=
> EOF && ch != '\n'; ++address)
> line[address] = ch;
>
> if (ch == '\n')
> line[address++] = ch;
>
> line[address] = '\0';

This can write outside the line array. What's wrong with line[0]? You
seem to not want to use it.

> return ch;
> }
>
> /* check for basic syntax errors */
> void syntaxcheck(char line[], int open_brace_address_map[], int
> closed_brace_address_map[])
> {
> int address;
>
> line_num += 1;
>
> for (address = 1; line[address] != '\0'; ++address)
> if (line[address] == '\'' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
> char_const_state = IN;
>
> else if (line[address] == '\'' && line[address+1] == '\'' &&
> char_const_state == IN) {

What's this case for?

> char_const_state = OUT;
> ++address;
> }
> else if (line[address] == '\'' && char_const_state == IN)
> char_const_state = OUT;

You know my view of these sort of state variables rather than using
plain Boolean values. You obviously disagree so I won't make the point
again!

> else if (line[address] == '"' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
> quoted_string_state = IN;
>
> else if (line[address] == '"' && quoted_string_state == IN)
> quoted_string_state = OUT;
>
> else if (line[address] == '/' && line[address+1] == '*' &&
> quoted_string_state == OUT && comment_state == OUT) {
> comment_state = IN;
> ++address;
> }
> else if (line[address] == '*' && line[address+1] == '/' &&
> comment_state == IN) {
> comment_state = OUT;
> ++address;
> }
> else if (line[address] == '(' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
> ++parentheses_detected;
>
> else if (line[address] == ')' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
> --parentheses_detected;
>
> else if (line[address] == '[' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
> ++brackets_detected;
>
> else if (line[address] == ']' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
> --brackets_detected;
>
> else if (line[address] == '{' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT) {
> open_brace_address_map[line_num] = address;
> referance_num = line_num+1;
> }
> else if (line[address] == '}' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT) {
> closed_brace_address_map[line_num] = address;
>
> while (open_brace_address_map[referance_num] == 0)
> --referance_num;
>
> if (open_brace_address_map[referance_num] > 0) {
> open_brace_address_map[referance_num] = 0;
> closed_brace_address_map[line_num] = 0;
> }
> }
>
> if (at_start == EOF) {
> if (char_const_state == IN)
> printf("syntax error...... fragmented character constant.\n");
>
> if (quoted_string_state == IN)
> printf("syntax error...... character string missing closing
> argument.\n");
>
> if (comment_state == IN)
> printf("syntax error...... expected '*/' token after identifier.
> \n");
>
> if (parentheses_detected != 0)
> printf("syntax error...... unbalanced parentheses detected.\n");
>
> if (brackets_detected != 0)
> printf("syntax error...... unbalanced brackets detected.\n");
>
> for (line_num = 1; line_num < MAXLINE; ++line_num) {
> if (closed_brace_address_map[line_num] > 0) {
> printf("syntax error......line(%d),", line_num);
> printf(" col(%d), expected identifier before '}' token.\n",
> closed_brace_address_map[line_num]);
> }
> if (open_brace_address_map[line_num] > 0) {
> printf("syntax error......line(%d),", line_num);
> printf(" col(%d), closing argument missing, expected '}'.
> \n", open_brace_address_map[line_num]);
> }
> }
> }
> }
>
> /* time to execute the entire construct */
> int main(void)
> {
> char line[MAXLINE];
> int open_brace_address_map[MAXLINE],
> closed_brace_address_map[MAXLINE];
>
> at_start = char_const_state = quoted_string_state = comment_state =
> OUT;
> referance_num = parentheses_detected = brackets_detected = 0;
>
> for (line_num = 1; line_num < MAXLINE; ++line_num) {
> open_brace_address_map[line_num] = 0;
> closed_brace_address_map[line_num] = 0;
> }
> line_num = referance_num;
> open_brace_address_map[line_num] = closed_brace_address_map[line_num]
> = -1;
>
> while (at_start != EOF) {
> at_start = gotline(line, MAXLINE);
> syntaxcheck(line, open_brace_address_map,
> closed_brace_address_map);
> }
> return 0;
> }

I don't understand what the brace_address_map arrays are for. Maybe a
comment about them would help.

Case to consider:

(a) C can have continuation lines.
(b) array[(x]); might be described as unbalanced parentheses.
(c) Similarly, you might want to check for )( and ][.
(c) This is valid C:

#define OPEN {
int main(void) { return 0; }

and fails whereas this is wrong and passes:

#define OPEN {
#define CLOSE }
int main(void) OPEN return 0;

I suspect you will have to rule out any use of the pre-processor because
of all the tricks it can play. Another is:

#define STR(x) #x
STR(})

(d) Technically, there are trigraphs and digraphs to consider as well.
There is no shame in ignoring these.

--
Ben.

gw7rib

7/13/2011 9:08:00 PM

0

On Jul 12, 11:46 pm, Ceriousmall <divadsm...@gmail.com> wrote:
> feel free to review this code I look forward to the
> comments............

Just a small point. This bit of code:

>                 if (line[address] == '\'' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
>                         char_const_state = IN;
>
>                 else if (line[address] == '\'' && line[address+1] == '\'' &&
> char_const_state == IN) {
>                         char_const_state = OUT;
>                         ++address;
>                 }
>                 else if (line[address] == '\'' && char_const_state == IN)
>                         char_const_state = OUT;
>
>                 else if (line[address] == '"' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
>                         quoted_string_state = IN;
>
>                 else if (line[address] == '"' && quoted_string_state == IN)
>                         quoted_string_state = OUT;
>
>                 else if (line[address] == '/' && line[address+1] == '*' &&
> quoted_string_state == OUT && comment_state == OUT) {
>                                 comment_state = IN;
>                                 ++address;
>                 }
>                 else if (line[address] == '*' && line[address+1] == '/' &&
> comment_state == IN) {
>                         comment_state = OUT;
>                         ++address;
>                 }
>                 else if (line[address] == '(' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
>                         ++parentheses_detected;
>
>                 else if (line[address] == ')' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
>                         --parentheses_detected;
>
>                 else if (line[address] == '[' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
>                         ++brackets_detected;
>
>                 else if (line[address] == ']' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT)
>                         --brackets_detected;
>
>                 else if (line[address] == '{' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT) {
>                         open_brace_address_map[line_num] = address;
>                         referance_num = line_num+1;
>                 }
>                 else if (line[address] == '}' && char_const_state == OUT &&
> quoted_string_state == OUT && comment_state == OUT) {
>                         closed_brace_address_map[line_num] = address;

might look better if you wrote it as a switch. Might even run faster,
too.

Robert Spanjaard

7/13/2011 9:18:00 PM

0

On 07/13/2011 11:07 PM, Paul N wrote:
> On Jul 12, 11:46 pm, Ceriousmall<divadsm...@gmail.com> wrote:
>> feel free to review this code I look forward to the
>> comments............
>
> Just a small point. This bit of code:
>
>> if (line[address] == '\''&& char_const_state == OUT&&
>> quoted_string_state == OUT&& comment_state == OUT)
>> char_const_state = IN;
>>
[...]
>> else if (line[address] == '}'&& char_const_state == OUT&&
>> quoted_string_state == OUT&& comment_state == OUT) {
>> closed_brace_address_map[line_num] = address;
>
> might look better if you wrote it as a switch. Might even run faster,
> too.

I think the OP did one of the exercises at the end of chapter 1 of K&R's
TCPL 2nd Edition: "Exercise 1-24. Write a program to check a C program for
rudimentary syntax errors like unmatched parentheses, brackets and braces.
Don't forget about quotes, both single and double, escape sequences, and
comments. (This program is hard if you do it in full generality.)"

So while most suggestions for improvement might be valid, it would be nice
to see if someone can improve the program only using stuff that's handled by
the first chapter. :-)

--
Regards, Robert http://www....

Ceriousmall

7/14/2011 2:45:00 PM

0

The open and closed brace address maps are used to keep track of which
line and column these braces occur. The line array Subscripts keeps
track of the columns and are passed to the o/c brace address map
array. subscript zero of the line array would've produced a column
zero.
It does need a comment..............