Re: [PATCH v4 15/19] object-file.c: release the "tag" in check_tag()

René Scharfe <l.s.r@xxxxxx> · Wed, 18 Jan 2023 19:05:32 +0100

Am 18.01.23 um 18:19 schrieb Jeff King:
> On Tue, Jan 17, 2023 at 08:58:24PM +0100, René Scharfe wrote:
>
>> Am 17.01.23 um 18:11 schrieb Ævar Arnfjörð Bjarmason:
>>> Fix a memory leak that's been with us ever since c879daa2372 (Make
>>> hash-object more robust against malformed objects, 2011-02-05). With
>>> "HASH_FORMAT_CHECK" (used by "hash-object" and "replace") we'll parse
>>> tags into a throwaway variable on the stack, but weren't freeing the
>>> "item->tag" we might malloc() when doing so.
>>>
>>> The clearing that release_tag_memory() does for us is redundant here,
>>> but let's use it as-is anyway. It only has one other existing caller,
>>> which does need the tag to be cleared.
>>
>> Calling it is better than getting our hands dirty with tag internals
>> here.
>>
>> There's similar leak in check_commit() in the same file, but plugging it
>> would require exporting unparse_commit().  Or perhaps using the heavy
>> hammer that is release_commit_memory()?  Anyway, it doesn't seem simple
>> to me, so that would be a patch for a separate series.
>
> I think both of these cases are a bit sketchy, because they create an
> object struct on the stack, and not via alloc_*_node(), which may
> violate assumptions elsewhere. In the case of the tag, it's probably OK.
> For the commit, though, the "index" field is going to be 0, which may
> confuse release_commit_memory(). It calls free_commit_buffer(), which is
> going to use that index to try to free from the buffer slab.
>
> So I'd be wary of calling that. I'm also slightly skeptical of the
> existing code that calls parse_commit_buffer(), but I think it works. We
> do not attach the buffer to the commit object there (good), and we pass
> "0" for check_graph, so the graph code (which IIRC also uses the index
> for some slab lookups) isn't run.
>
> I think this code would be better off either:
>
>   1. Just allocating an object struct in the usual way. I understand the
>      desire not to spend extra memory, but parse_commit_buffer() is
>      going to allocate structs under the hood for the tree and parent
>      commits anyway.
>
>   2. The point of this code is to find malformed input to hash-object.
>      We're probably better off feeding the buffer to fsck_commit(), etc.
>      It does more thorough checks, and these days it does not need an
>      object struct at all.

I like the second one, as long as it won't check too much.  c879daa237
(Make hash-object more robust against malformed objects, 2011-02-05) added
the checks that are now in object-file.c and intended to only validate the
internal structure of objects, not relations between.  It gave the example
to allow adding a commit before its tree, which should be allowed.  And
IIUC fsck_object() fits that bill.

> Either of which would naturally fix the leak for tags. I'm not sure
> there actually is a leak for commits, as commit structs don't store any
> strings themselves.

parse_commit_buffer() allocates the list of parents.

Hmm, and it looks them up.  Doesn't this violate the goal to allow
dangling references?

> I don't think any of that needs to hold up Ævar's current series,
> though. Fixing the leak this way in the meantime is OK, and then if we
> switch to one of the other techniques, the problem just goes away.

Right.

René