Re: [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Wed, 6 Jul 2022 10:25:05 -0700

Sorry for getting back to this so late. I don't have any issues with the
patches, but just to close the loop:

Elijah Newren <newren@xxxxxxxxx> writes:
> On Mon, Jun 27, 2022 at 11:47 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
> >
> > "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> > > The testcases added in t6423 a couple commits ago are slightly different
> > > but similar in principle.  They involve a similar case of paired
> > > renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> > > a leading directory of B/ to C/.
> >
> > Hmm...one side moved sub1 -> sub3 and the other moved sub2 from the root
> > to under sub1, right? So it's more like A/ -> B/ and C/ -> A/C/.
> 
> Hmm, maybe I should have been more explicit in my mapping:
>    A = sub2
>    B = sub1/sub2
>    leading directory of B = sub1
>    C = sub3

Substituting into A/ -> B/ and "a leading directory of B/ to C/", we
get:

  sub2 -> sub1/sub2 and sub1 -> sub3

which is indeed what is happening. I see, thanks.

> > > And both sides add a new file
> > > somewhere under the directory that the other side will rename.
> >
> > Rename or move, I think.
> 
> I'm confused; I don't understand what this comment means.  Renames
> tend to be created using "git mv", before or after making content
> changes, so to me a file being renamed or a file being moved to a
> different location are synonymous.  What distinction are you making
> between renames and moves?

I was thinking that a rename means that the directory still has the same
parent directory, whereas a move means that the directory keeps its
basename but has a different parent directory. Maybe it's just the way I
think about things, but the important thing here is that a directory was
moved so that its parent directory is a directory that is different and
that already exists, and I think that this meaning is lost when we say
"rename". But it might just be me.

> Hmm, perhaps I should change this to:
> 
> /* Cases where we don't have or don't want a directory rename for this
> path, so we return NULL */
> 
> The purpose of this function is to check whether a given path would be
> modified by directory renaming to get a new path.  So, "no new path"
> means we don't have an applicable directory rename or don't want to
> use it.

I see - the new text makes sense.

> > >       if (strmap_empty(dir_renames))
> > >               return new_path;
> > > +     if (strmap_get(&collisions[other_side], path))
> > > +             return new_path;
> >
> > So as far as I understand, collisions here, for a given side, is a map.
> > The map's keys are all the filenames of added and renamed files (I'm
> > assuming that's what 'A' and 'R' are) from that side after any directory
> > moves on the other side are applied. So, for example, if we add "foo/a"
> > on side A and rename "foo/" to "bar/" on side B, then side A's collision
> > map would have a key "bar/a". So I'm not sure if "collision" is the
> > right name here, but the existing code already uses it so I'll leave it
> > be.
> 
> Let's take your example a bit further, to discuss the original kind of
> usecase that "collisions" was written for.  In addition to adding
> "foo/a" on side A, we also add "foo2/a" and "foo3/a".  And then in
> addition to renaming "foo/" to "bar/" on side B, we also rename
> "foo2/" to "bar/" and "foo3/" to "bar/", thus merging all of foo/,
> foo2/, and foo3/ into a single directory named bar/.  If the files in
> foo/, foo2/, and foo3/ on side B were all unique, you can see how
> there'd be no problem merging these directories together.  The problem
> only comes at merge time when you attempt to apply the directory
> renames from side B to the new files on side A.  That's when you get
> collisions, in this case three files that would be named bar/a.
> 
> Checking for such collisions was the purpose of the "collisions"
> metadata, so I think the name is fitting.  The only problem is that
> we're reusing that data now for a slightly different purpose, though I
> think it's still "collision-y".

That makes sense, thanks.

> > It makes sense that this situation (of side A having "bar/a" because
> > side B renamed "foo/" to "bar/", and at the same time, side B adds its
> > own "bar/a") is weird, as stated in the commit message, so I don't mind
> > disabling checking for directory rename in this case.  However, in
> > theory, I don't see how disabling this check would fix anything, since
> > the bug seems to be about two different files on different sides being
> > renamed to the same filename through some convoluted means. (Unless this
> > is the only convoluted means to do it, and disabling it means that the
> > bug wouldn't occur.)

[snip]

> Hmm...let me see if I can explain this a different way.
> 
> The short version of the issue here is that if a directory rename
> wants to rename NEWFILE -> ALTERNATE_NEWFILE, but there is another
> directory rename on the other side of history that wants to rename
> ANOTHER_FILE -> NEWFILE, then we run into problems and have to turn
> off the NEWFILE -> ALTERNATE_NEWFILE.  This check here is all about
> this case.
> 
> To see why this is the problematic setup...
> 
> Now a conflict_info
> stores information about the OIDs and modes for all three sides of the
> merge (i.e. both sides of the merge and the base).  Whenever a rename
> is processed, we have to update this map, because the rename makes the
> conflict_info now apply to a different path.  In the simple cases, the
> conflict_info from the source path needs to be merged with the
> conflict_info for the target path, and the source path's conflict_info
> needs to be marked as null (literally setting the .is_null field to
> 1).  rename/rename and such can make this a bit more complicated.

Ah, I think I was missing this part. The intention is that processing
one side in this way (and thus modifying its conflict_info) would not
affect the processing of any other sides, but there is a bug that makes
it not so. (And the intention is not, say, when processing all sides,
to write the output in a new data structure so that the result is the
same no matter the order.)

So I think the answer to my question is: no, we do not want to be able
to rename two different files on different sides to the same filename
through any convoluted means, and if that happens, it's a bug.

[snip illustration of what happens in either merge order]

> To avoid this order dependence, and the weird multiple-renaming of a
> single path, we just want to turn off directory renames when the
> source of a directory rename on one side is the target of a directory
> rename on the other.  That's what this patch does.
> 
> 
> Does that help?  Or did I make it more confusing?

I think that helped, thanks.