On Tue, Jun 7, 2022 at 5:11 PM Glen Choo <chooglen@xxxxxxxxxx> wrote: > > (I'm not 100% what the bug _is_, only that there is one.) > > = Report > > At $DAYJOB, there was a report that "git merge" was failing on certain > branches. Fortunately, the repo is publicly accessible, so I can share > the full reproduction recipe: > > git clone https://android.googlesource.com/platform/external/tensorflow tensorflow && > cd tensorflow && > git merge origin/upstream-master # HEAD is at origin/master > > This gives: > > Performing inexact rename detection: 100% (4371280/4371280), done. > Performing inexact rename detection: 100% (12529218/12529218), done. > Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410. > > This bug seems specific to merge-ort; "git merge -s recursive > origin/upstream-master" seems to work as expected. > > In case the branches have changed since then, here are the commit ids: > > $ git rev-parse origin/master > 68e55281824e8a79fa67e1a3061f39bd4c4b2e57 > $ git rev-parse origin/upstream-master > 0be5bb09aeeff3a6825842326fadc8159a5553ab > $ git merge-base 68e55281824e8a79fa67e1a3061f39bd4c4b2e57 0be5bb09aeeff3a6825842326fadc8159a5553ab > 8e819019081f39d83df42baba4acfced3abf3f90 > > = Interesting info > > I don't understand the merge-ort code enough to understand what's going > on, but I was able to find some (hopefully helpful) details. I added > this log line just above the offending assert() call: > > trace2_printf("0 %s, 1 %s, 2 %s, fm %d, dm %d", ci->pathnames[0], > ci->pathnames[1], ci->pathnames[2], ci->filemask, ci->dirmask); > > Here are the lines I thought were suspicious: > > 0 <path1>, 1 <path1>, 2 <path1>, fm 2, dm 0 > [...] > 0 <path2>, 1 <path1>, 2 <path2>, fm 6, dm 0 # this is the last line > > Notice that the last line detected a rename from <path2> to <path1>, but > we already saw <path1> earlier. > > IIUC "(ci->filemask == 2 || ci->filemask == 4)" can be read as "the path > either exists on only the left side or only the right side of the > merge", so ci->filemask == 6 should mean "the path exists on both sides > of the merge"? > > "-s recursive" seems to handle the rename just fine (it picks <path2> > IIRC). > > I also dug into each commit to see which paths were present: > > head="origin/master" > other="origin/upstream-master" > merge_base="$(git merge-base origin/master origin/upstream-master)" > path1="tensorflow/lite/g3doc/convert/metadata_writer_tutorial.ipynb" > path2="tensorflow/lite/g3doc/models/convert/metadata_writer_tutorial.ipynb" > > git rev-parse "$head:$path1" # (exists) > git rev-parse "$head:$path2" # (doesn't exist) > > git rev-parse "$other:$path1" # (doesn't exist) > git rev-parse "$other:$path2" # (exists) > > git rev-parse "$merge_base:$path1" # (doesn't exist) > git rev-parse "$merge_base:$path2" # (doesn't exist) > > i.e. both files are new and are renames of each other. I haven't tried > using this property to create a minimally-reproducing recipe though. Thanks for the detailed report; very cool. Interestingly, if you reverse the direction of the merge (checkout origin/upstream-master and merge origin/master) then you get a different error: error: cache entry has null sha1: tensorflow/lite/g3doc/examples/convert/metadata_writer_tutorial.ipynb fatal: unable to write .git/index This merge has a lot of stuff going on, including some big directory renames, new files, loads of conflicts, etc. I think the relevant bits are the following: merge base -> side1: Rename tensorflow/lite/g3doc/models/ -> tensorflow/lite/g3doc/examples Add tensorflow/lite/g3doc/convert/metadata_writer_tutorial.ipynb merge base -> side2: Rename tensorflow/lite/g3doc/convert/ -> tensorflow/lite/g3doc/models/convert/ Add tensorflow/lite/g3doc/models/convert/metadata_writer_tutorial.ipynb So the combination of the above means: side1:tensorflow/lite/g3doc/convert/metadata_writer_tutorial.ipynb would be affected by the side2 directory rename to become tensorflow/lite/g3doc/models/convert/metadata_writer_tutorial.ipynb which would match the name of the file on side2, BUT the side2 version of that file, i.e.: side2:tensorflow/lite/g3doc/models/convert/metadata_writer_tutorial.ipynb would be affected by the side1 directory rename to become tensorflow/lite/g3doc/examples/convert/metadata_writer_tutorial.ipynb and what becomes of the side1 version of that file? It gets messy... I have a small reproduction recipe (using the style from t6423 to explain it): # Commit O: sub1/file, sub2/other # Commit A: sub3/file, sub2/{other, new_add_add_file_1} # Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2} # # In words: # A: sub1/ -> sub3/, add sub2/new_add_add_file_1 # B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2 This small reproduction recipe triggers the same assertion-failure you reported; further, when the merge direction is reversed this testcase shows the same alternative error I showed above for your bigger testcase. However, interestingly, this simple testcase also triggers those same null-sha1 and unable-to-write-.git/index errors in merge-recursive, regardless of the direction of the merge. I don't know why my testcase triggers bugs in merge-recursive and your bigger testcase doesn't, but I wasn't too motivated to find out either; I was mostly focused on trying to understand the merge-ort side of things. Now, there's code in both merge-recursive and merge-ort for avoiding doubly transitive renames (i.e. one side renames directory A/ -> B/, and the other side renames directory B/ -> C/, and even worse if the original side also renames C/ -> D/, because this combination would otherwise make a mess for new files added to A/ on the first side and wondering which directory they end up in). However, this is a testcase where a _leading directory_ of B/ is renamed to C/, which sidesteps the normal doubly transitive rename check, and then the code heads down what looks like the wrong path until it is caught by the assertion check you reported. In addition to the funny-doubly-transitive-rename (involving the leading directory), your testcase also has both sides add a file, one side to A/ and the other side to B/ (with the same basename but different file contents), which adds to the "fun". Anyway, long story short...I don't have a fix yet, but just thought I'd mention I saw the email and spent some hours digging in.