Re: [PATCH v4 15/19] object-file.c: release the "tag" in check_tag()

Jeff King <peff@xxxxxxxx> · Wed, 18 Jan 2023 12:19:32 -0500

On Tue, Jan 17, 2023 at 08:58:24PM +0100, René Scharfe wrote:

> Am 17.01.23 um 18:11 schrieb Ævar Arnfjörð Bjarmason:
> > Fix a memory leak that's been with us ever since c879daa2372 (Make
> > hash-object more robust against malformed objects, 2011-02-05). With
> > "HASH_FORMAT_CHECK" (used by "hash-object" and "replace") we'll parse
> > tags into a throwaway variable on the stack, but weren't freeing the
> > "item->tag" we might malloc() when doing so.
> >
> > The clearing that release_tag_memory() does for us is redundant here,
> > but let's use it as-is anyway. It only has one other existing caller,
> > which does need the tag to be cleared.
> 
> Calling it is better than getting our hands dirty with tag internals
> here.
> 
> There's similar leak in check_commit() in the same file, but plugging it
> would require exporting unparse_commit().  Or perhaps using the heavy
> hammer that is release_commit_memory()?  Anyway, it doesn't seem simple
> to me, so that would be a patch for a separate series.

I think both of these cases are a bit sketchy, because they create an
object struct on the stack, and not via alloc_*_node(), which may
violate assumptions elsewhere. In the case of the tag, it's probably OK.
For the commit, though, the "index" field is going to be 0, which may
confuse release_commit_memory(). It calls free_commit_buffer(), which is
going to use that index to try to free from the buffer slab.

So I'd be wary of calling that. I'm also slightly skeptical of the
existing code that calls parse_commit_buffer(), but I think it works. We
do not attach the buffer to the commit object there (good), and we pass
"0" for check_graph, so the graph code (which IIRC also uses the index
for some slab lookups) isn't run.

I think this code would be better off either:

  1. Just allocating an object struct in the usual way. I understand the
     desire not to spend extra memory, but parse_commit_buffer() is
     going to allocate structs under the hood for the tree and parent
     commits anyway.

  2. The point of this code is to find malformed input to hash-object.
     We're probably better off feeding the buffer to fsck_commit(), etc.
     It does more thorough checks, and these days it does not need an
     object struct at all.

Either of which would naturally fix the leak for tags. I'm not sure
there actually is a leak for commits, as commit structs don't store any
strings themselves.

I don't think any of that needs to hold up Ævar's current series,
though. Fixing the leak this way in the meantime is OK, and then if we
switch to one of the other techniques, the problem just goes away.

-Peff