Re: [PATCH 5/7] tmp-objdir: new API for creating and removing primary object dirs

Elijah Newren <newren@xxxxxxxxx> · Thu, 30 Sep 2021 19:31:44 -0700

On Thu, Sep 30, 2021 at 12:46 AM Jeff King <peff@xxxxxxxx> wrote:
>
> On Thu, Sep 30, 2021 at 03:26:42AM -0400, Jeff King wrote:
>
> > > > If you remove the tmp_objdir as the primary as soon as you're done with
> > > > the merge, but before you run the diff, you might be OK, though.
> > >
> > > It has to be after I run the diff, because the diff needs access to
> > > the temporary files to diff against them.
> >
> > Right, of course. I was too fixated on the object-write part, forgetting
> > that the whole point of the exercise is to later read them back. :)
>
> Ah, no, I remember what I was trying to say here. The distinction is
> between "remove the tmp_objdir" and "remove it as the primary".
>
> I.e., if you do this:
>
>   1. create tmp_objdir
>
>   2. make tmp_objdir primary for writes
>
>   3. run the "merge" half of remerge-diff, writing objects into the
>      temporary space
>
>   4. stop having tmp_objdir as the primary; instead make it an alternate
>
>   5. run the diff
>
>   6. remove tmp_objdir totally
>
> Then step 5 can't accidentally write objects into the temporary space,
> but it can still read them. So it's not entirely safe, but it's safer,
> and it would be a much smaller change.

Interesting.

> Some ways it could go wrong:
>
>   - is it possible for the merge code to ever write an object? I kind of
>     wonder if we'd ever do any cache-able transformations as part of a
>     content-level merge. I don't think we do now, though.

Yes, of course -- otherwise there would have been no need for the
tmp_objdir in the first place.  In particular, it needs to write
three-way-content merges of individual files, and it needs to write
new tree objects.  (And it needs to do this both for creating the
virtual merge bases if the merge is recursive, as well as doing it for
the outer merge.)

It doesn't write anything for caching reasons, such as line ending
normalizations (that's all kept in-memory and done on demand).

>   - in step 5, write_object_file() may still be confused by the presence
>     of the to-be-thrown-away objects in the alternate. This is pretty
>     unlikely, as it implies that the remerge-diff wrote a blob or tree
>     that is byte-identical to something that the diff wants to write.

That's one reason it could be confused.  The textconv filtering in
particular was creating a new object based on an existing one, and a
tree, and a ref.  What if there was some other form of caching or
statistic gathering that didn't write a new object based on an
existing one, but did add trees and especially refs that referenced
the existing object?  It's not that the diff wanted to write something
byte-identical to what the merge wrote, it's just that the diff wants
to reference the object somehow.