On Tue, Feb 20, 2024 at 12:58 PM Mário Guimarães <mario.luis.guimaraes@xxxxxxxxx> wrote: > > Hello Git community, > > please see the report below of what may be a bug. > > Yours sincerely > Mário Guimarães > > ====================================================== > Thank you for filling out a Git bug report! > Please answer the following questions to help us understand your issue. > > What did you do before the bug happened? (Steps to reproduce your issue) > > In the rust-lang/rust repository (I cloned today from GitHub), if we > run the command > > git diff-tree -r -M a04e649^2 a04e649 -- > tests/ui/issues/issue-83190.rs > tests/ui/type-alias-impl-trait/nested-rpit-with-lifetimes.rs > > we get this result > > :100644 100644 da931c3edaf6f9de6805e771f2b3b28edd27001f > 11b659eec97323ea5190dac1771c7ca3241861e7 R100 > tests/ui/issues/issue-83190.rs > tests/ui/type-alias-impl-trait/nested-rpit-with-lifetimes.rs > > However, the source and destination files of the rename are not 100% > equal. If we run this other command > > git diff -M a04e649^2 a04e649 -- tests/ui/issues/issue-83190.rs > tests/ui/type-alias-impl-trait/nested-rpit-with-lifetimes.rs > > we get the following result > > diff --git a/tests/ui/issues/issue-83190.rs > b/tests/ui/type-alias-impl-trait/nested-rpit-with-lifetimes.rs > similarity index 100% > rename from tests/ui/issues/issue-83190.rs > rename to tests/ui/type-alias-impl-trait/nested-rpit-with-lifetimes.rs > index da931c3edaf..11b659eec97 100644 > --- a/tests/ui/issues/issue-83190.rs > +++ b/tests/ui/type-alias-impl-trait/nested-rpit-with-lifetimes.rs > @@ -1,7 +1,7 @@ > -// check-pass > - > // Regression test for issue #83190, triggering an ICE in borrowck. > > +// check-pass > + > pub trait Any {} > impl<T> Any for T {} Heh, good point. And more generally, due to how the similarity checks work (split the file into lines/chunks, hash each to an integer, then sort the list of integers and compare the list of integers between two files), whenever you keep all the original lines of a file but permute their order, you will see a 100% match. Maybe a simple hack like the below would suffice to fix (untested -- anyone want to test it out for me?): diff --git a/diff.c b/diff.c index a2def45644b..6b7b3a8b9af 100644 --- a/diff.c +++ b/diff.c @@ -4405,7 +4405,7 @@ static void run_external_diff(const char *pgm, static int similarity_index(struct diff_filepair *p) { - return p->score * 100 / MAX_SCORE; + return p->score * 100 / (MAX_SCORE+1); } static const char *diff_abbrev_oid(const struct object_id *oid, int abbrev) diff --git a/diffcore-rename.c b/diffcore-rename.c index 5a6e2bcac71..3228a898e0b 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -314,7 +314,7 @@ static int find_identical_files(struct hashmap *srcs, break; } if (best) { - record_rename_pair(dst_index, best->index, MAX_SCORE); + record_rename_pair(dst_index, best->index, MAX_SCORE+1); renames++; } return renames; Which is based on the premise that 100% of source lines can match 100% of destination lines and result in a score of MAX_SCORE, so we manually set the scoring to MAX_SCORE+1 when we detect exact renames based on matching hashes and then use that higher value as the divisor when computing percentages. (Note that there are several other places where "MAX_SCORE" is used and which I'm pretty sure we would _not_ want to replace with "MAX_SCORE+1", so if you don't like my patch and want to redefine MAX_SCORE, then all those other locations would need to be adjusted instead.)