Sorry for getting back to this so late. I don't have any issues with the patches, but just to close the loop: Elijah Newren <newren@xxxxxxxxx> writes: > On Mon, Jun 27, 2022 at 11:47 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > > > > "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > The testcases added in t6423 a couple commits ago are slightly different > > > but similar in principle. They involve a similar case of paired > > > renaming but instead of A/ -> B/ and B/ -> C/, the second side renames > > > a leading directory of B/ to C/. > > > > Hmm...one side moved sub1 -> sub3 and the other moved sub2 from the root > > to under sub1, right? So it's more like A/ -> B/ and C/ -> A/C/. > > Hmm, maybe I should have been more explicit in my mapping: > A = sub2 > B = sub1/sub2 > leading directory of B = sub1 > C = sub3 Substituting into A/ -> B/ and "a leading directory of B/ to C/", we get: sub2 -> sub1/sub2 and sub1 -> sub3 which is indeed what is happening. I see, thanks. > > > And both sides add a new file > > > somewhere under the directory that the other side will rename. > > > > Rename or move, I think. > > I'm confused; I don't understand what this comment means. Renames > tend to be created using "git mv", before or after making content > changes, so to me a file being renamed or a file being moved to a > different location are synonymous. What distinction are you making > between renames and moves? I was thinking that a rename means that the directory still has the same parent directory, whereas a move means that the directory keeps its basename but has a different parent directory. Maybe it's just the way I think about things, but the important thing here is that a directory was moved so that its parent directory is a directory that is different and that already exists, and I think that this meaning is lost when we say "rename". But it might just be me. > Hmm, perhaps I should change this to: > > /* Cases where we don't have or don't want a directory rename for this > path, so we return NULL */ > > The purpose of this function is to check whether a given path would be > modified by directory renaming to get a new path. So, "no new path" > means we don't have an applicable directory rename or don't want to > use it. I see - the new text makes sense. > > > if (strmap_empty(dir_renames)) > > > return new_path; > > > + if (strmap_get(&collisions[other_side], path)) > > > + return new_path; > > > > So as far as I understand, collisions here, for a given side, is a map. > > The map's keys are all the filenames of added and renamed files (I'm > > assuming that's what 'A' and 'R' are) from that side after any directory > > moves on the other side are applied. So, for example, if we add "foo/a" > > on side A and rename "foo/" to "bar/" on side B, then side A's collision > > map would have a key "bar/a". So I'm not sure if "collision" is the > > right name here, but the existing code already uses it so I'll leave it > > be. > > Let's take your example a bit further, to discuss the original kind of > usecase that "collisions" was written for. In addition to adding > "foo/a" on side A, we also add "foo2/a" and "foo3/a". And then in > addition to renaming "foo/" to "bar/" on side B, we also rename > "foo2/" to "bar/" and "foo3/" to "bar/", thus merging all of foo/, > foo2/, and foo3/ into a single directory named bar/. If the files in > foo/, foo2/, and foo3/ on side B were all unique, you can see how > there'd be no problem merging these directories together. The problem > only comes at merge time when you attempt to apply the directory > renames from side B to the new files on side A. That's when you get > collisions, in this case three files that would be named bar/a. > > Checking for such collisions was the purpose of the "collisions" > metadata, so I think the name is fitting. The only problem is that > we're reusing that data now for a slightly different purpose, though I > think it's still "collision-y". That makes sense, thanks. > > It makes sense that this situation (of side A having "bar/a" because > > side B renamed "foo/" to "bar/", and at the same time, side B adds its > > own "bar/a") is weird, as stated in the commit message, so I don't mind > > disabling checking for directory rename in this case. However, in > > theory, I don't see how disabling this check would fix anything, since > > the bug seems to be about two different files on different sides being > > renamed to the same filename through some convoluted means. (Unless this > > is the only convoluted means to do it, and disabling it means that the > > bug wouldn't occur.) [snip] > Hmm...let me see if I can explain this a different way. > > The short version of the issue here is that if a directory rename > wants to rename NEWFILE -> ALTERNATE_NEWFILE, but there is another > directory rename on the other side of history that wants to rename > ANOTHER_FILE -> NEWFILE, then we run into problems and have to turn > off the NEWFILE -> ALTERNATE_NEWFILE. This check here is all about > this case. > > To see why this is the problematic setup... > > Now a conflict_info > stores information about the OIDs and modes for all three sides of the > merge (i.e. both sides of the merge and the base). Whenever a rename > is processed, we have to update this map, because the rename makes the > conflict_info now apply to a different path. In the simple cases, the > conflict_info from the source path needs to be merged with the > conflict_info for the target path, and the source path's conflict_info > needs to be marked as null (literally setting the .is_null field to > 1). rename/rename and such can make this a bit more complicated. Ah, I think I was missing this part. The intention is that processing one side in this way (and thus modifying its conflict_info) would not affect the processing of any other sides, but there is a bug that makes it not so. (And the intention is not, say, when processing all sides, to write the output in a new data structure so that the result is the same no matter the order.) So I think the answer to my question is: no, we do not want to be able to rename two different files on different sides to the same filename through any convoluted means, and if that happens, it's a bug. [snip illustration of what happens in either merge order] > To avoid this order dependence, and the weird multiple-renaming of a > single path, we just want to turn off directory renames when the > source of a directory rename on one side is the target of a directory > rename on the other. That's what this patch does. > > > Does that help? Or did I make it more confusing? I think that helped, thanks.