On Thu, Jan 11, 2024 at 3:00 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > diff --git a/diffcore-delta.c b/diffcore-delta.c > > index c30b56e983b..7136c3dd203 100644 > > --- a/diffcore-delta.c > > +++ b/diffcore-delta.c > > @@ -159,6 +159,10 @@ static struct spanhash_top *hash_chars(struct repository *r, > > n = 0; > > accum1 = accum2 = 0; > > } > > + if (n > 0) { > > + hashval = (accum1 + accum2 * 0x61) % HASHBASE; > > + hash = add_spanhash(hash, hashval, n); > > + } > > OK, so we were ignoring the final short bit that is not terminated > with LF due to the "continue". Nicely found. > > > diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh > > index 85be1367de6..29299acbce7 100755 > > --- a/t/t4001-diff-rename.sh > > +++ b/t/t4001-diff-rename.sh > > @@ -286,4 +286,23 @@ test_expect_success 'basename similarity vs best similarity' ' > > test_cmp expected actual > > ' > > > > +test_expect_success 'last line matters too' ' > > + test_write_lines a 0 1 2 3 4 5 6 7 8 9 >nonewline && > > + printf "git ignores final up to 63 characters if not newline terminated" >>nonewline && > > + git add nonewline && > > + git commit -m "original version of file with no final newline" && > > I found it misleading that the file whose name is nonewline has > bunch of LF including on its last line ;-). Yeah, sorry. It's been a while, but I think I started with a file with only one line that had no LF, but then realized for inexact rename detection to kick in I needed some other lines, at least one of which didn't match...but I forgot to update the filename after adding the other lines... > > + # Change ONLY the first character of the whole file > > + test_write_lines b 0 1 2 3 4 5 6 7 8 9 >nonewline && > > Same here, but it is too much to bother rewriting it ... > > { > test_write_lines ... > printf ... > } >incomplete > > ... like so ("incomplete" stands for "file ending with an incomplete line"), > so I'll let it pass. > > > + printf "git ignores final up to 63 characters if not newline terminated" >>nonewline && > > > > + git add nonewline && > > + git mv nonewline still-no-newline && > > + git commit -a -m "rename nonewline -> still-no-newline" && > > + git diff-tree -r -M01 --name-status HEAD^ HEAD >actual && > > + cat >expected <<-\EOF && > > + R097 nonewline still-no-newline > > I am not very happy with the hardcoded 97. You are already using > the non-standard 10% threshold. If the delta detection that > forgets about the last line is so broken as your proposed log > message noted, shouldn't you be able to construct a sample pair of > preimage and postimage for which the broken version gives so low > similarity to be judged not worth treating as a rename, while the > fixed version gives reasonable similarity to be made into a rename, > by the default threshold? That way, the test only needs to see if > we got a rename (with any similarity) or a delete and an add. Oops, the threshold is entirely unnecessary here; not sure why I didn't remember to take it out (originally used the threshold while testing without the fix to just how low of a similarity git thought these nearly identical files had). Since you don't like the threshold, and since we don't seem to have a summary format that reports on the rename without the percentage, I guess I need to munge the output with sed: sed -e "s/^R[0-9]* /R /" actual >actual.munged && and then compare the expected output to that. I'll send in a patch doing so and fix up the filenames and drop the rename threshold while at it.