On Thu, Sep 30, 2021 at 07:31:44PM -0700, Elijah Newren wrote: > > Some ways it could go wrong: > > > > - is it possible for the merge code to ever write an object? I kind of > > wonder if we'd ever do any cache-able transformations as part of a > > content-level merge. I don't think we do now, though. > > Yes, of course -- otherwise there would have been no need for the > tmp_objdir in the first place. In particular, it needs to write > three-way-content merges of individual files, and it needs to write > new tree objects. (And it needs to do this both for creating the > virtual merge bases if the merge is recursive, as well as doing it for > the outer merge.) Right, sorry, I should have added: ...in addition to the merge-results we're writing. What I'm getting at is whether there might be _other_ parts of the code that the merge code would invoke. Say, if a .gitattribute caused us to convert a file's encoding in order to perform a more semantic merge, and we wanted to store that result somehow (e.g., in a similar cache). I don't think anything like that exists now, but I don't find it outside the realm of possibility in the future. It's sort of the merge equivalent of "textconv"; we have external diff and external merge drivers, but being able to convert contents and feed them to the regular text diff/merge code simplifies things. > > - in step 5, write_object_file() may still be confused by the presence > > of the to-be-thrown-away objects in the alternate. This is pretty > > unlikely, as it implies that the remerge-diff wrote a blob or tree > > that is byte-identical to something that the diff wants to write. > > That's one reason it could be confused. The textconv filtering in > particular was creating a new object based on an existing one, and a > tree, and a ref. What if there was some other form of caching or > statistic gathering that didn't write a new object based on an > existing one, but did add trees and especially refs that referenced > the existing object? It's not that the diff wanted to write something > byte-identical to what the merge wrote, it's just that the diff wants > to reference the object somehow. Yes, good point. It can help a little if the alternate-odb added by tmp_objdir was marked with a flag to say "hey, this is temporary". That would help write_object_file() decide not to depend on it. But the problem is so much wider than that that I think it will always be a bit dangerous; any code could say "can I access this object" and we don't know if its intent is simply to read it, or to create a new object that references it. -Peff