Re: [PATCH 5/7] tmp-objdir: new API for creating and removing primary object dirs

Jeff King <peff@xxxxxxxx> · Fri, 1 Oct 2021 01:21:47 -0400

On Thu, Sep 30, 2021 at 07:31:44PM -0700, Elijah Newren wrote:

> > Some ways it could go wrong:
> >
> >   - is it possible for the merge code to ever write an object? I kind of
> >     wonder if we'd ever do any cache-able transformations as part of a
> >     content-level merge. I don't think we do now, though.
> 
> Yes, of course -- otherwise there would have been no need for the
> tmp_objdir in the first place.  In particular, it needs to write
> three-way-content merges of individual files, and it needs to write
> new tree objects.  (And it needs to do this both for creating the
> virtual merge bases if the merge is recursive, as well as doing it for
> the outer merge.)

Right, sorry, I should have added: ...in addition to the merge-results
we're writing. What I'm getting at is whether there might be _other_
parts of the code that the merge code would invoke. Say, if a
.gitattribute caused us to convert a file's encoding in order to perform
a more semantic merge, and we wanted to store that result somehow (e.g.,
in a similar cache).

I don't think anything like that exists now, but I don't find it outside
the realm of possibility in the future. It's sort of the merge
equivalent of "textconv"; we have external diff and external merge
drivers, but being able to convert contents and feed them to the regular
text diff/merge code simplifies things.

> >   - in step 5, write_object_file() may still be confused by the presence
> >     of the to-be-thrown-away objects in the alternate. This is pretty
> >     unlikely, as it implies that the remerge-diff wrote a blob or tree
> >     that is byte-identical to something that the diff wants to write.
> 
> That's one reason it could be confused.  The textconv filtering in
> particular was creating a new object based on an existing one, and a
> tree, and a ref.  What if there was some other form of caching or
> statistic gathering that didn't write a new object based on an
> existing one, but did add trees and especially refs that referenced
> the existing object?  It's not that the diff wanted to write something
> byte-identical to what the merge wrote, it's just that the diff wants
> to reference the object somehow.

Yes, good point. It can help a little if the alternate-odb added by
tmp_objdir was marked with a flag to say "hey, this is temporary". That
would help write_object_file() decide not to depend on it. But the
problem is so much wider than that that I think it will always be a bit
dangerous; any code could say "can I access this object" and we don't
know if its intent is simply to read it, or to create a new object that
references it.

-Peff