Re: [PATCH 1/3] fsck: free tree buffers after walking unreachable objects

Jeff King <peff@xxxxxxxx> · Thu, 22 Sep 2022 14:58:17 -0400

On Thu, Sep 22, 2022 at 11:40:05AM -0700, Junio C Hamano wrote:

> > diff --git a/builtin/fsck.c b/builtin/fsck.c
> > index f7916f06ed..34e575a170 100644
> > --- a/builtin/fsck.c
> > +++ b/builtin/fsck.c
> > @@ -228,6 +228,8 @@ static void mark_unreachable_referents(const struct object_id *oid)
> >  
> >  	options.walk = mark_used;
> >  	fsck_walk(obj, NULL, &options);
> > +	if (obj->type == OBJ_TREE)
> > +		free_tree_buffer((struct tree *)obj);
> >  }
> 
> Unlike codepaths like mark_object(), which uses the REACHABLE bit to
> avoid the walker coming into an already marked objects, we have no
> protection that says "this tree is already marked as USED, so lets
> not go into its contents" (it would be a disaster if we free tree
> buffer here and then later end up calling the function on the same
> tree), but it is OK because this is an unreachable object nobody
> points at and we will never come back?

I do think it is true that this is the final time we'd look at these
objects. But I don't think it would be a disaster if somebody did. The
free_tree_buffer() function clears the "parsed" flag on the struct. And
anybody wishing to look at the tree contents would need to call
parse_tree(), which would then re-load the contents.

In general, that's _possibly_ less efficient if we visit the same tree
twice, but it would be balanced against not holding all of the tree data
in RAM at once. And as I said, that doesn't happen for this use case
anyway.

As a side note, IMHO having tree->buffer at all is a mistake, because it
leads to exactly this kind of confusion about when the buffer should be
discarded. We'd be better off having all callers parse directly into a
local buffer, and then clean up when they're done. It effectively ends
up the same, except then it's obvious when a tree is "leaked" because
the local buffer goes out of scope, rather than hanging around in the
global struct and just wasting memory. But that's obviously a much
bigger change.

-Peff