Re: [PATCH] mktag: don't check SHA-1 object length under SHA-256

Jeff King <peff@xxxxxxxx> · Mon, 23 Nov 2020 16:34:21 -0500

On Mon, Nov 23, 2020 at 01:01:11PM +0100, Ævar Arnfjörð Bjarmason wrote:

> @@ -46,9 +48,17 @@ static int verify_tag(char *buffer, unsigned long size)
>  	struct object_id oid;
>  	const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb, *p;
>  	size_t len;
> -
> -	if (size < 84)
> -		return error("wanna fool me ? you obviously got the size wrong !");
> +	int minimum_size =
> +		/* Minimum possible input, see t/t3800-mktag.sh */
> +		strlen("object ") + the_hash_algo->hexsz + strlen("\n") +
> +		strlen("type tag\n") +
> +		strlen("tag x\n") +
> +		strlen("tagger . <> 0 +0000\n") +
> +		strlen("\n");

This is an obvious improvement, but I have to wonder whether this kind
of "minimum size" up-front check is really buying us much. We do a lot
of fixed-size memcmps in the rest of the function.

In the early bits, like:

  if (memcmp(type_line - 1, "\ntype ", 6))
          return error("char%d: could not find \"\\ntype \"", 47);

that is saving us from going off the edge of a too-small input (though
wow, that bare "47" is really horrific).

But later, we'll do this:

  if (memcmp(tagger_line, "tagger ", 7))
           return error("char%"PRIuMAX": could not find \"tagger \"",
                   (uintmax_t) (tagger_line - buffer));

We've already parsed "tag x" at this point, and "x" can be arbitrarily
long. Are we sure we have 7 bytes of the buffer left?

The buffer always has a trailing NUL in it, so it would never match but
memcmp() is allowed to look at the whole thing (e.g., it can start at
the end, or do quadword comparisons). And that usually wouldn't matter,
but may depending on how the strbuf grows, and whether our heap buffer
is near the end of the last allocate page. Try this:

  git init
  git commit --allow-empty -m foo
  {
    echo "object $(git rev-parse HEAD)"
    echo "type commit"
    perl -e 'print "tag "; print "a" x 4026; print "\n"'
  } | git mktag

which ASan will complain about, since the "tagger" memcmp reads past the
end of the buffer (I found 4026 by running that in a loop of 1..4096).

I think this whole thing would be better written to just parse
left-to-right within the buffer, using bits like skip_prefix() that rely
on us having a trailing NUL (which we always will, and there should not
be one inside the header section).

This is how fsck_tag() does it. In fact, it seems like we could just
reuse fsck_tag() here. It wants to pass an oid to report(), but it would
be OK to use null_oid here; ultimately it just ends up back in our
callback error_func().

-Peff