On Fri, Feb 12, 2021 at 5:15 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > From: Elijah Newren <newren@xxxxxxxxx> > > > > Add a simple test where a removed file is similar to two different added > > files; one of them has the same basename, and the other has a slightly > > higher content similarity. Without break detection, filename similarity > > of 100% trumps content similarity for pairing up related files. For > > any filename similarity less than 100%, the opposite is true -- content > > similarity is all that matters. Add a testcase that documents this. > > I am not sure why it is the "opposite". When contents are similar > to the same degree of 100%, we tiebreak with the filename. We never > favor a pair between the same filename over a pair between different > filenames with better content similarity. This is not true. If src/main.c is 99% similar to src/foo.c, and is 0% similar to the src/main.c in the new commit, we match the old src/main.c to the new src/main.c despite being far more similar src/foo.c. Unless break detection is turned on, we do not allow content similarity to trump (full) filename equality. > And when contents are similar to the same degree of less than 100%, > we do not favor a pair between the same filename over a pair between > different filenames, as long as they are similar to the same degree. This is also not true; we tiebreak with filenames for inexact renames just like we do for exact renames (note that basename_same() is called both from find_identical_files() and from the nested loop where inexact rename detection is done). > So, I do not think "opposite" is helping readers to understand what > is going on. > > > +test_expect_success 'basename similarity vs best similarity' ' > > + mkdir subdir && > > + test_write_lines line1 line2 line3 line4 line5 \ > > + line6 line7 line8 line9 line10 >subdir/file.txt && > > + git add subdir/file.txt && > > + git commit -m "base txt" && > > + > > + git rm subdir/file.txt && > > + test_write_lines line1 line2 line3 line4 line5 \ > > + line6 line7 line8 >file.txt && > > + test_write_lines line1 line2 line3 line4 line5 \ > > + line6 line7 line8 line9 >file.md && > > + git add file.txt file.md && > > + git commit -a -m "rename" && > > + git diff-tree -r -M --name-status HEAD^ HEAD >actual && > > + # subdir/file.txt is 89% similar to file.md, 78% similar to file.txt, > > + # but since same basenames are checked first... > > I am not sure what the second line of this comment wants to imply > with the ellipses here. Care to finish the sentence? > > Or was the second line planned to be added when we start applying > the "check only the same filename first and see if we find a > better-than-reasonable match" heuristics but somehow survived > "rebase -i" and ended up here? Oops, indeed; that is precisely what happened. Will fix. > > + cat >expected <<-\EOF && > > + R088 subdir/file.txt file.md > > + A file.txt > > + EOF > > + test_cmp expected actual > > Thanks.