Use of strbuf.buf when strbuf.len == 0

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 26 Sep 2007 23:21:24 -0700

I am starting to hate some parts of the strbuf API.

Here is an example.  Can you spot what goes wrong?

static int handle_file(const char *path,
	 unsigned char *sha1, const char *output)
{
	SHA_CTX ctx;
	char buf[1024];
	int hunk = 0, hunk_no = 0;
	struct strbuf one, two;
	...
	if (sha1)
		SHA1_Init(&ctx);

	strbuf_init(&one, 0);
	strbuf_init(&two,  0);
	while (fgets(buf, sizeof(buf), f)) {
		if (!prefixcmp(buf, "<<<<<<< "))
			hunk = 1;
		else if (!prefixcmp(buf, "======="))
			hunk = 2;
		else if (!prefixcmp(buf, ">>>>>>> ")) {
			int cmp = strbuf_cmp(&one, &two);

			hunk_no++;
			hunk = 0;
			if (cmp > 0) {
				strbuf_swap(&one, &two);
			}
			if (out) {
				fputs("<<<<<<<\n", out);
				fwrite(one.buf, one.len, 1, out);
				fputs("=======\n", out);
				fwrite(two.buf, two.len, 1, out);
				fputs(">>>>>>>\n", out);
			}
			if (sha1) {
				SHA1_Update(&ctx, one.buf, one.len + 1);
				SHA1_Update(&ctx, two.buf, two.len + 1);
			}
			strbuf_reset(&one);
			strbuf_reset(&two);
		} else if (hunk == 1)
			strbuf_addstr(&one, buf);
		else if (hunk == 2)
			strbuf_addstr(&two, buf);
		else if (out)
			fputs(buf, out);
	}
	strbuf_release(&one);
	strbuf_release(&two);
	...
}

When one or two are empty and the caller asked for checksumming,
the code still relies on one.buf being an allocated memory with
an extra NUL termination and tries to feed the NULL pointer to
SHA1_Update() with length of 1!

An obvious workaround is to say "strbuf_init(&one, !!sha1)" to
force the allocation.  However, because the second parameter to
strbuf_init() is defined to be merely a hint, we should not rely
on strbuf_init() with non-zero hint to have allocation from the
beginning.  It is assuming too much.

A more defensive way would be for the user to do something like:

	SHA1_Update(&ctx, one.buf ? one.buf : "", one.len + 1);
	SHA1_Update(&ctx, two.buf ? two.buf : "", two.len + 1);

which I am leaning towards, but this looks ugly.

I was already bitten by another bug in strbuf_setlen() that
shared the same roots; an empty strbuf can have two internal
representations ("alloc == 0 && buf == NULL" vs "alloc != 0 &&
buf != NULL") and they behave differently.

For example, this happens to be Ok but the validity of it is
subtle.  If a or b is empty, memcmp may get a NULL pointer but
we ask only 0 byte comparison.

        int strbuf_cmp(struct strbuf *a, struct strbuf *b)
        {
                int cmp;
                if (a->len < b->len) {
                        cmp = memcmp(a->buf, b->buf, a->len);
                        return cmp ? cmp : -1;
                } else {
                        cmp = memcmp(a->buf, b->buf, b->len);
                        return cmp ? cmp : a->len != b->len;
                }
        }

It might be an easier and safer fix to define that strbuf_init()
to always have allocation.  Use of a strbuf in the code _and_
not adding any contents to the buffer should be an exception and
avoiding malloc()/free() for that special case feels optimizing
for the wrong case.

However, there are strbuf instances that are not initialized
(i.e. in BSS or initialized by declaring with STRBUF_INIT), so
we still need to handle (.len == 0 && .alloc == 0) case
specially anyway.

It would be appreciated if somebody with a fresh pair of eyes
can go over the strbuf series one more time to make sure that we
do not try to blindly use buf.buf, assuming buf.buf[0] is NUL if
(buf.len == 0).

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html