Re: Git string manipulation functions wrong?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 21 May 2007 16:59:25 +0200 Karl Hasselström <kha@xxxxxxxxxxx> wrote:
> On 2007-05-21 16:36:16 +0200, Petr Baudis wrote:
> > It's the opposite for me - we don't properly set the NUL byte for
> > smoe of our strncpy() calls, but I don't really see his problem with
> > snprintf(), we seem to handle its return value correctly everywhere
> > (except diff.c, but there the buffer sizes should be designed in
> > such a way that an overflow should be impossible).
> 
> I think this kind of detailed case-by-case analysis defeats Timo's
> point, though: that the C library functions make it too easy to write
> bugs. If it's necessary to do non-trivial bounds checking etc. at
> every call site, it doesn't really matter if we currently do get them
> all right; at some point, we _are_ going to miss one. Instead of using
> our collective C-fu to get difficult calls right, we should be using
> it to construct string routines that have low enough overhead that
> it's lost in the noise, and are dead simple to use (and, of course,
> that can be cleanly bypassed in the 1% of cases where it's necessary).

That would be mostly true, except for the fact that without snprintf()
returning how many bytes _would_ have been written, it's much harder to
reliably allocate buffers for the result on the first pass.  For
example, this is a trivial implementation of an function which returns
a freshly-allocated formatted string:

	char *data;
	unsigned long len;
	len = snprintf(NULL, 0, some_fmt, arg1, arg2, arg3);
	if (!len)
		return NULL;
	data = malloc(len+1);
	if (!data)
		return NULL;
	data[len] = '\0';
	snprintf(data, len, some_fmt, arg1, arg2, arg3);
	return data;

You can't do that without a loop if it returns how many bytes were
actually written (although some braindead platforms do that already).
Here's a function which handles both use-cases in an optimal way:

	char *data = NULL;
	unsigned long datalen = 0, len;
	do {
		len = snprintf(data, datalen, some_fmt, arg1, arg2, arg3);
		if (!datalen) {
			datalen = len ? len : 16;
			data = malloc(datalen);
			if (!data)
				return NULL;
		} else if (len >= datalen) {
			void *newmem;
			datalen = (len > datalen)?(len + 1):(datalen +16);
			newmem = realloc(data, datalen);
			if (!newmem) {
				free(data);
				return NULL
			}
		}
	} while (len >= datalen);
	data[len] = '\0';
	return data;

Hopefully, on a nice modern platform, the first iteration will have len
equal to the ideal actual required length and so it will hit the first
case and carefully allocate exactly enough bytes, then on the second
loop through it will fill in exactly the required bytes and return
success.  On one of the abovementioned dain-bramaged systems, this will
loop until snprintf doesn't use all the space in the buffer,
incrementing by some fixed value each time (in this implementation,
16).  It should be obvious that correctly-implemented systems will be
significantly more performant than ones without the useful "feature" of
POSIX-compliance. :-D

Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux