On Tue, Sep 24, 2024 at 05:54:34PM -0400, Jeff King wrote: > When we parse a commit via repo_parse_commit_internal(), if > save_commit_buffer is set we'll stuff the buffer of the object contents > into a cache, overwriting any previous value. Interesting. I saw some cases that I think could be caused by this, but couldn't make much sense of them. > This can result in a leak of that previously cached value, though it's > rare in practice. If we have a value in the cache it would have come > from a previous parse, and during that parse we'd set the object.parsed > flag, causing any subsequent parse attempts to exit without doing any > work. > > But it's possible to "unparse" a commit, which we do when registering a > commit graft. And since shallow fetches are implemented using grafts, > the leak is triggered in practice by t5539. > > There are a number of possible ways to address this: > > 1. the unparsing function could clear the cached commit buffer, too. I s/the/The/ > think this would work for the case I found, but I'm not sure if > there are other ways to end up in the same state (an unparsed > commit with an entry in the commit buffer cache). > > 2. when we parse, we could check the buffer cache and prefer it to s/when/When/ > reading the contents from the object database. In theory the > contents of a particular sha1 are immutable, but the code in > question is violating the immutability with grafts. So this > approach makes me a bit nervous, although I think it would work in > practice (the grafts are applied to what we parse, but we still > retain the original contents). > > 3. We could realize the cache is already populated and discard its > contents before overwriting. It's possible some other code could be > holding on to a pointer to the old cache entry (and we'd introduce > a use-after-free), but I think the risk of that is relatively low. > > 4. The reverse of (3): when the cache is populated, don't bother > saving our new copy. This is perhaps a little weird, since we'll > have just populated the commit struct based on a different buffer. > But the two buffers should be the same, even in the presence of > grafts (as in (2) above). > > I went with option 4. It addresses the leak directly and doesn't carry > any risk of breaking other assumptions. And it's the same technique used > by parse_object_buffer() for this situation, though I'm not sure when it > would even come up there. The extra safety has been there since > bd1e17e245 (Make "parse_object()" also fill in commit message buffer > data., 2005-05-25). Okay. This feels a bit weird indeed, but the fact that we already use the same strategy in other places makes me feel way safer. > This lets us mark t5539 as leak-free. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > commit.c | 3 ++- > t/t5539-fetch-http-shallow.sh | 1 + > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/commit.c b/commit.c > index 3a54e4db0d..cc03a93036 100644 > --- a/commit.c > +++ b/commit.c > @@ -595,7 +595,8 @@ int repo_parse_commit_internal(struct repository *r, > } > > ret = parse_commit_buffer(r, item, buffer, size, 0); > - if (save_commit_buffer && !ret) { > + if (save_commit_buffer && !ret && > + !get_cached_commit_buffer(r, item, NULL)) { > set_commit_buffer(r, item, buffer, size); > return 0; > } And the fix is trivial. Patrick