On Thu, Sep 30, 2021 at 12:46 AM Jeff King <peff@xxxxxxxx> wrote: > > On Thu, Sep 30, 2021 at 03:26:42AM -0400, Jeff King wrote: > > > > > If you remove the tmp_objdir as the primary as soon as you're done with > > > > the merge, but before you run the diff, you might be OK, though. > > > > > > It has to be after I run the diff, because the diff needs access to > > > the temporary files to diff against them. > > > > Right, of course. I was too fixated on the object-write part, forgetting > > that the whole point of the exercise is to later read them back. :) > > Ah, no, I remember what I was trying to say here. The distinction is > between "remove the tmp_objdir" and "remove it as the primary". > > I.e., if you do this: > > 1. create tmp_objdir > > 2. make tmp_objdir primary for writes > > 3. run the "merge" half of remerge-diff, writing objects into the > temporary space > > 4. stop having tmp_objdir as the primary; instead make it an alternate > > 5. run the diff > > 6. remove tmp_objdir totally > > Then step 5 can't accidentally write objects into the temporary space, > but it can still read them. So it's not entirely safe, but it's safer, > and it would be a much smaller change. Interesting. > Some ways it could go wrong: > > - is it possible for the merge code to ever write an object? I kind of > wonder if we'd ever do any cache-able transformations as part of a > content-level merge. I don't think we do now, though. Yes, of course -- otherwise there would have been no need for the tmp_objdir in the first place. In particular, it needs to write three-way-content merges of individual files, and it needs to write new tree objects. (And it needs to do this both for creating the virtual merge bases if the merge is recursive, as well as doing it for the outer merge.) It doesn't write anything for caching reasons, such as line ending normalizations (that's all kept in-memory and done on demand). > - in step 5, write_object_file() may still be confused by the presence > of the to-be-thrown-away objects in the alternate. This is pretty > unlikely, as it implies that the remerge-diff wrote a blob or tree > that is byte-identical to something that the diff wants to write. That's one reason it could be confused. The textconv filtering in particular was creating a new object based on an existing one, and a tree, and a ref. What if there was some other form of caching or statistic gathering that didn't write a new object based on an existing one, but did add trees and especially refs that referenced the existing object? It's not that the diff wanted to write something byte-identical to what the merge wrote, it's just that the diff wants to reference the object somehow.