Re: [PATCH 1/3] fsck: free tree buffers after walking unreachable objects

Jeff King <peff@xxxxxxxx> · Thu, 22 Sep 2022 18:16:12 -0400

On Thu, Sep 22, 2022 at 12:27:33PM -0700, Junio C Hamano wrote:

> > As a side note, IMHO having tree->buffer at all is a mistake, because it
> > leads to exactly this kind of confusion about when the buffer should be
> > discarded. We'd be better off having all callers parse directly into a
> > local buffer, and then clean up when they're done.
> 
> Yeah, tree-walk.c users woud use tree_desc structure anyway, and
> instead of having a moving pointer that points into a separate thing
> (i.e. tree->buffer), it could have its own copy of the "whole buffer"
> that can be used to free when it is done iterating over entries.
> 
> > .... But that's obviously a much bigger change.
> 
> Yup.

I took a (very) brief stab at this, out of curiosity. The sticking point
becomes obvious very quickly: how do we get the buffer to the caller? If
you are calling parse_tree(), we can add new out-parameters to provide
the buffer. But something like parse_object() is just returning an
object struct, and we have to stuff anything we want to communicate to
the caller inside the polymorphic struct which contains it.

We could split the concept of "parse" away from "get the buffer"
entirely, but then we have a potential slowdown. The "parse" functions
really want to open the object contents and check the hash (and removing
that in the general case would probably break part of fsck, at least).
So we'd end up inflating the object contents twice, which would probably
have a measurable impact.

I don't plan on digging any further on it for now, so this is just a
note for future people who do. :)

-Peff