Asp Forum - snprintf without null-termination?

Lauri Alanko

6/22/2011 10:45:00 AM

I would like to format some output into a buffer that is exactly of
the correct size to hold the formatted output _without_ a terminating
NUL. However, the snprintf function always ensures that the resulting
string is null-terminated, even if this means truncating the output.

This behavior is inconsistent with how strncpy works, and it seems to
be the wrong default: if snprintf did not ensure null-termination, the
caller could easily provide it if required. But there seems to be no
efficient way in the other direction.

Currently, I have to allocate a temporary buffer that is one byte
longer, then snprintf to the temporary buffer, then memcpy everything
but the terminating '\0' to the final destination buffer. Is there a
better way?

Thanks,

Lauri

12 Answers

Mark Bluemel

6/22/2011 10:56:00 AM

On 06/22/2011 11:45 AM, Lauri Alanko wrote:
> I would like to format some output into a buffer that is exactly of
> the correct size to hold the formatted output _without_ a terminating
> NUL.

So you don't want your output to be a valid C string? Can we ask why
this is?

> However, the snprintf function always ensures that the resulting
> string is null-terminated, even if this means truncating the output.

That is to say the result is a valid C string...

> This behavior is inconsistent with how strncpy works,

Which suggests to me that strncpy is badly designed, rather than
snprintf - strncpy returns a pointer to the destination which may not be
a valid C string.

> and it seems to
> be the wrong default: if snprintf did not ensure null-termination, the
> caller could easily provide it if required. But there seems to be no
> efficient way in the other direction.
>
> Currently, I have to allocate a temporary buffer that is one byte
> longer, then snprintf to the temporary buffer, then memcpy everything
> but the terminating '\0' to the final destination buffer. Is there a
> better way?

I can't think of one immediately. What are you going to do with the
destination buffer? If it's going to be passed to a function which takes
an address and a length, the issue's trivial, of course.

Lauri Alanko

6/22/2011 11:59:00 AM

In article <itsho4$p5a$1@dont-email.me>,
Mark Bluemel <mark_bluemel@pobox.com> wrote:
> On 06/22/2011 11:45 AM, Lauri Alanko wrote:
> > I would like to format some output into a buffer that is exactly of
> > the correct size to hold the formatted output _without_ a terminating
> > NUL.
>
> So you don't want your output to be a valid C string? Can we ask why
> this is?

C strings are horrible, and there's no compelling reason to use them
except when interfacing with existing libraries. In my own code, I
prefer to have a different representation for strings, one where the
length is stored separately.

When implementing operations for my strings, I'd like to leverage the
standard library as much as possible, and formatting in particular is
non-trivial so I'd like to avoid reimplementing it from scratch.

> What are you going to do with the
> destination buffer? If it's going to be passed to a function which takes
> an address and a length, the issue's trivial, of course.

Would you care to elaborate? Certainly I could simply make the
destination buffer one byte longer, but that is one byte of memory
wasted, and that is significant when dealing with a large number of
very short strings, as I happen to be.

Lauri

Tom St Denis

6/22/2011 1:08:00 PM

On Jun 22, 7:59 am, Lauri Alanko <l...@iki.fi> wrote:
> In article <itsho4$p5...@dont-email.me>,
> Mark Bluemel <mark_blue...@pobox.com> wrote:
>
> > On 06/22/2011 11:45 AM, Lauri Alanko wrote:
> > > I would like to format some output into a buffer that is exactly of
> > > the correct size to hold the formatted output _without_ a terminating
> > > NUL.
>
> > So you don't want your output to be a valid C string? Can we ask why
> > this is?
>
> C strings are horrible, and there's no compelling reason to use them
> except when interfacing with existing libraries. In my own code, I
> prefer to have a different representation for strings, one where the
> length is stored separately.
>
> When implementing operations for my strings, I'd like to leverage the
> standard library as much as possible, and formatting in particular is
> non-trivial so I'd like to avoid reimplementing it from scratch.
>
> > What are you going to do with the
> > destination buffer? If it's going to be passed to a function which takes
> > an address and a length, the issue's trivial, of course.
>
> Would you care to elaborate? Certainly I could simply make the
> destination buffer one byte longer, but that is one byte of memory
> wasted, and that is significant when dealing with a large number of
> very short strings, as I happen to be.

Why not just wrap snprintf with a function that doesn't copy the NUL
byte to the final destination buffer?

vsnprintf exists for a reason...

Tom

Mark Bluemel

6/22/2011 1:17:00 PM

On 06/22/2011 12:59 PM, Lauri Alanko wrote:
> In article<itsho4$p5a$1@dont-email.me>,
> Mark Bluemel<mark_bluemel@pobox.com> wrote:
>> On 06/22/2011 11:45 AM, Lauri Alanko wrote:
>>> I would like to format some output into a buffer that is exactly of
>>> the correct size to hold the formatted output _without_ a terminating
>>> NUL.
>>
>> So you don't want your output to be a valid C string? Can we ask why
>> this is?
>
> C strings are horrible, and there's no compelling reason to use them
> except when interfacing with existing libraries. [Snip]
>
>... I'd like to leverage the
> standard library as much as possible

You seem to be keen on having your cake and eating it, it seems to me.
On the one hand you want to avoid standard C strings and on the other
you want to use library routines which are predicated on them...

Have you considered grabbing a public domain snprintf implementation and
hacking it to your requirements?

Keith Thompson

6/22/2011 4:02:00 PM

Lauri Alanko <la@iki.fi> writes:
> I would like to format some output into a buffer that is exactly of
> the correct size to hold the formatted output _without_ a terminating
> NUL. However, the snprintf function always ensures that the resulting
> string is null-terminated, even if this means truncating the output.
>
> This behavior is inconsistent with how strncpy works, and it seems to
> be the wrong default: if snprintf did not ensure null-termination, the
> caller could easily provide it if required. But there seems to be no
> efficient way in the other direction.
>
> Currently, I have to allocate a temporary buffer that is one byte
> longer, then snprintf to the temporary buffer, then memcpy everything
> but the terminating '\0' to the final destination buffer. Is there a
> better way?

It's strncpy that's inconsistent with the other string functions.
It's designed to work with a very specialized data structure,
consisting of an array containing some number of non-null characters
followed by zero or more null bytes. In particular, strncpy can
pad its destination with *multiple* null bytes. (I think it was
primarily used to store file names in early versions of Unix.)

Almost all the other string functions (such as strncat) deal with
C-style null-terminated strings, and attempt to avoid creating
character arrays that don't have the null terminator.

As you're seeing, this particular data format is *very* deeply
entwined into the standard C library, and to a lesser extent into
the language itself. Working with a different format while using
the C standard library is just going to be difficult.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.ne...
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ben Bacarisse

6/22/2011 4:07:00 PM

Tom St Denis <tom@iahu.ca> writes:
<snip>
> Why not just wrap snprintf with a function that doesn't copy the NUL
> byte to the final destination buffer?

That sounds like what the OP is doing. From the first post:

| Currently, I have to allocate a temporary buffer that is one byte
| longer, then snprintf to the temporary buffer, then memcpy everything
| but the terminating '\0' to the final destination buffer. Is there a
| better way?

or have you some other way to wrap snprintf in mind?

<snip>
--
Ben.

gordonb.sfo8k

6/22/2011 9:15:00 PM

>> This behavior is inconsistent with how strncpy works, and it seems to
>> be the wrong default: if snprintf did not ensure null-termination, the
>> caller could easily provide it if required. But there seems to be no

In actual practice, it seemed to work like "the caller could easily
forget to provide it if required".

>> efficient way in the other direction.

> It's strncpy that's inconsistent with the other string functions.
> It's designed to work with a very specialized data structure,
> consisting of an array containing some number of non-null characters
> followed by zero or more null bytes. In particular, strncpy can
> pad its destination with *multiple* null bytes. (I think it was
> primarily used to store file names in early versions of Unix.)

Correct. A UNIX V6 (and I believe V7, System III, and some System
V) directory entry was a fixed 16 bytes long: 2 bytes of inode
number (limiting an inode number to 16 bits, which wasn't too bad
given the disk sizes available at the time) and 14 bytes of file
name. It was not that uncommon to run into application software
that would do funny things with 14-character file names, like print
a few garbage characters after the file name, or fail to open the
file, because the application didn't allow for the possibility of
the file name being non-NUL-terminated.

A later interface for directory reading was readdir() when BSD
introduced long file names. The important thing about readdir()
here was that the file name was guaranteed NUL-terminated *and* it
provided the length of the name in another struct element. This
was back-ported to the older systems so applications could use the
same interface on both. Early back-ports managed to have problems
forgetting to guarantee the NUL-termination on 14-character file
names on old-style filesystems which sometimes caused problems (app
depended on NUL, with or without using the length) and sometimes
didn't (app used the length).

Because comparisons were done on all 14 characters, if the name
entry in a directory was:
char name[14] = {'t', 'e', 's', 't', '.', 'c', '\0', '\0',
'\0', '\0', '\0', '\0', 'Z', '\0'};
the name entry was inaccessible, because there was no way to generate
a path name that matched the 'Z' at the end. This was never supposed
to happen, but it did occasionally, due to power supply failures,
bad buffer memory in disk controllers, and other hardware issues.

> Almost all the other string functions (such as strncat) deal with
> C-style null-terminated strings, and attempt to avoid creating
> character arrays that don't have the null terminator.
>
> As you're seeing, this particular data format is *very* deeply
> entwined into the standard C library, and to a lesser extent into
> the language itself. Working with a different format while using
> the C standard library is just going to be difficult.

I've heard a lot of claims of a "better" string library that's
supposed to just drop in with relatively few application changes
(a header file and library), and perhaps use of a typedef to declare
strings and a macro around quoted string literals. This rarely
holds up, even if you change *everything* to use the new string
format (including, say, fopen(), sprintf(), fprintf(), and fgets()).
It might be "better" (more elegant, and under some circumstances
faster, and under some circumstances less error-prone or not subject
to buffer overflows), but the drop-in claim usually doesn't work.

The important thing that usually gets forgotten is that not only
do the string library functions have to be changed, but also
operations like string[n], which gets you the n'th character of
string (count starts at 0, like all C arrays), and even worse,
string[n] = 0, which truncates the string (and all versions of it
sharing the same memory). There's also string+n, which yields a
string with the first n characters lopped off, and further it shares
memory with the original string. It's very difficult to make this
"drop-in" even with C++ operator overloading and starting with apps
written to use C strings. Certain types of parsing (often those
that write over part of a string with \0 *temporarily*, then put
the original character back later) typically have to be completely
re-written.

Keith Thompson

6/23/2011 12:07:00 AM

gordonb.sfo8k@burditt.org (Gordon Burditt) writes:
>>> This behavior is inconsistent with how strncpy works, and it seems to
>>> be the wrong default: if snprintf did not ensure null-termination, the
>>> caller could easily provide it if required. But there seems to be no
>
> In actual practice, it seemed to work like "the caller could easily
> forget to provide it if required".
>
>>> efficient way in the other direction.
>
>> It's strncpy that's inconsistent with the other string functions.
>> It's designed to work with a very specialized data structure,
>> consisting of an array containing some number of non-null characters
>> followed by zero or more null bytes. In particular, strncpy can
>> pad its destination with *multiple* null bytes. (I think it was
>> primarily used to store file names in early versions of Unix.)
>
> Correct. A UNIX V6 (and I believe V7, System III, and some System
> V) directory entry was a fixed 16 bytes long: 2 bytes of inode
> number (limiting an inode number to 16 bits, which wasn't too bad
> given the disk sizes available at the time) and 14 bytes of file
> name. It was not that uncommon to run into application software
> that would do funny things with 14-character file names, like print
> a few garbage characters after the file name, or fail to open the
> file, because the application didn't allow for the possibility of
> the file name being non-NUL-terminated.

On one hand, you have quoted my words (starting "It's strncpy that's
inconsistent ..." without giving me credit. I consider this quite
rude, as I've made abundantly clear in the past.

On the other hand, this is interesting information; thanks for
posting it.

[snip]

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.ne...
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

falk

6/23/2011 12:30:00 AM

In article <itsleq$2gh$1@oravannahka.helsinki.fi>,
Lauri Alanko <la@iki.fi> wrote:
>
>Would you care to elaborate? Certainly I could simply make the
>destination buffer one byte longer, but that is one byte of memory
>wasted, and that is significant when dealing with a large number of
>very short strings, as I happen to be.

Memory is cheap, spend the byte.

Keep in mind that if you're using any kind of standard library to allocate
the string buffers, there's going to be some wasted space anyway.
I just ran a quick test, and on my linux system, malloc() seems to
allocate space in 16-byte quanta, so if your string is not exactly a
multiple of 16 bytes, you're wasting space anyway.

--
-Ed Falk, falk@despams.r.us.com
http://thespamdiaries.blo...

Joe Pfeiffer

6/23/2011 12:49:00 AM

Lauri Alanko <la@iki.fi> writes:

> In article <itsho4$p5a$1@dont-email.me>,
> Mark Bluemel <mark_bluemel@pobox.com> wrote:
>> On 06/22/2011 11:45 AM, Lauri Alanko wrote:
>> > I would like to format some output into a buffer that is exactly of
>> > the correct size to hold the formatted output _without_ a terminating
>> > NUL.
>>
>> So you don't want your output to be a valid C string? Can we ask why
>> this is?
>
> C strings are horrible, and there's no compelling reason to use them
> except when interfacing with existing libraries. In my own code, I
> prefer to have a different representation for strings, one where the
> length is stored separately.
>
> When implementing operations for my strings, I'd like to leverage the
> standard library as much as possible, and formatting in particular is
> non-trivial so I'd like to avoid reimplementing it from scratch.
>
>> What are you going to do with the
>> destination buffer? If it's going to be passed to a function which takes
>> an address and a length, the issue's trivial, of course.
>
> Would you care to elaborate? Certainly I could simply make the
> destination buffer one byte longer, but that is one byte of memory
> wasted, and that is significant when dealing with a large number of
> very short strings, as I happen to be.

How are you going to store the length of the string without using at
least one byte?

comp.lang.c

snprintf without null-termination?

Lauri Alanko

Mark Bluemel

Lauri Alanko

Tom St Denis

Mark Bluemel

Keith Thompson

Ben Bacarisse

gordonb.sfo8k

Keith Thompson

falk

Joe Pfeiffer

x Login to ForumsZone