gordonb.sfo8k
6/22/2011 9:15:00 PM
>> This behavior is inconsistent with how strncpy works, and it seems to
>> be the wrong default: if snprintf did not ensure null-termination, the
>> caller could easily provide it if required. But there seems to be no
In actual practice, it seemed to work like "the caller could easily
forget to provide it if required".
>> efficient way in the other direction.
> It's strncpy that's inconsistent with the other string functions.
> It's designed to work with a very specialized data structure,
> consisting of an array containing some number of non-null characters
> followed by zero or more null bytes. In particular, strncpy can
> pad its destination with *multiple* null bytes. (I think it was
> primarily used to store file names in early versions of Unix.)
Correct. A UNIX V6 (and I believe V7, System III, and some System
V) directory entry was a fixed 16 bytes long: 2 bytes of inode
number (limiting an inode number to 16 bits, which wasn't too bad
given the disk sizes available at the time) and 14 bytes of file
name. It was not that uncommon to run into application software
that would do funny things with 14-character file names, like print
a few garbage characters after the file name, or fail to open the
file, because the application didn't allow for the possibility of
the file name being non-NUL-terminated.
A later interface for directory reading was readdir() when BSD
introduced long file names. The important thing about readdir()
here was that the file name was guaranteed NUL-terminated *and* it
provided the length of the name in another struct element. This
was back-ported to the older systems so applications could use the
same interface on both. Early back-ports managed to have problems
forgetting to guarantee the NUL-termination on 14-character file
names on old-style filesystems which sometimes caused problems (app
depended on NUL, with or without using the length) and sometimes
didn't (app used the length).
Because comparisons were done on all 14 characters, if the name
entry in a directory was:
char name[14] = {'t', 'e', 's', 't', '.', 'c', '\0', '\0',
'\0', '\0', '\0', '\0', 'Z', '\0'};
the name entry was inaccessible, because there was no way to generate
a path name that matched the 'Z' at the end. This was never supposed
to happen, but it did occasionally, due to power supply failures,
bad buffer memory in disk controllers, and other hardware issues.
> Almost all the other string functions (such as strncat) deal with
> C-style null-terminated strings, and attempt to avoid creating
> character arrays that don't have the null terminator.
>
> As you're seeing, this particular data format is *very* deeply
> entwined into the standard C library, and to a lesser extent into
> the language itself. Working with a different format while using
> the C standard library is just going to be difficult.
I've heard a lot of claims of a "better" string library that's
supposed to just drop in with relatively few application changes
(a header file and library), and perhaps use of a typedef to declare
strings and a macro around quoted string literals. This rarely
holds up, even if you change *everything* to use the new string
format (including, say, fopen(), sprintf(), fprintf(), and fgets()).
It might be "better" (more elegant, and under some circumstances
faster, and under some circumstances less error-prone or not subject
to buffer overflows), but the drop-in claim usually doesn't work.
The important thing that usually gets forgotten is that not only
do the string library functions have to be changed, but also
operations like string[n], which gets you the n'th character of
string (count starts at 0, like all C arrays), and even worse,
string[n] = 0, which truncates the string (and all versions of it
sharing the same memory). There's also string+n, which yields a
string with the first n characters lopped off, and further it shares
memory with the original string. It's very difficult to make this
"drop-in" even with C++ operator overloading and starting with apps
written to use C strings. Certain types of parsing (often those
that write over part of a string with \0 *temporarily*, then put
the original character back later) typically have to be completely
re-written.