On Mon, Nov 23, 2020 at 01:01:11PM +0100, Ævar Arnfjörð Bjarmason wrote: > @@ -46,9 +48,17 @@ static int verify_tag(char *buffer, unsigned long size) > struct object_id oid; > const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb, *p; > size_t len; > - > - if (size < 84) > - return error("wanna fool me ? you obviously got the size wrong !"); > + int minimum_size = > + /* Minimum possible input, see t/t3800-mktag.sh */ > + strlen("object ") + the_hash_algo->hexsz + strlen("\n") + > + strlen("type tag\n") + > + strlen("tag x\n") + > + strlen("tagger . <> 0 +0000\n") + > + strlen("\n"); This is an obvious improvement, but I have to wonder whether this kind of "minimum size" up-front check is really buying us much. We do a lot of fixed-size memcmps in the rest of the function. In the early bits, like: if (memcmp(type_line - 1, "\ntype ", 6)) return error("char%d: could not find \"\\ntype \"", 47); that is saving us from going off the edge of a too-small input (though wow, that bare "47" is really horrific). But later, we'll do this: if (memcmp(tagger_line, "tagger ", 7)) return error("char%"PRIuMAX": could not find \"tagger \"", (uintmax_t) (tagger_line - buffer)); We've already parsed "tag x" at this point, and "x" can be arbitrarily long. Are we sure we have 7 bytes of the buffer left? The buffer always has a trailing NUL in it, so it would never match but memcmp() is allowed to look at the whole thing (e.g., it can start at the end, or do quadword comparisons). And that usually wouldn't matter, but may depending on how the strbuf grows, and whether our heap buffer is near the end of the last allocate page. Try this: git init git commit --allow-empty -m foo { echo "object $(git rev-parse HEAD)" echo "type commit" perl -e 'print "tag "; print "a" x 4026; print "\n"' } | git mktag which ASan will complain about, since the "tagger" memcmp reads past the end of the buffer (I found 4026 by running that in a loop of 1..4096). I think this whole thing would be better written to just parse left-to-right within the buffer, using bits like skip_prefix() that rely on us having a trailing NUL (which we always will, and there should not be one inside the header section). This is how fsck_tag() does it. In fact, it seems like we could just reuse fsck_tag() here. It wants to pass an oid to report(), but it would be OK to use null_oid here; ultimately it just ends up back in our callback error_func(). -Peff