On Wed, Feb 10, 2021 at 8:41 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > From: Elijah Newren <newren@xxxxxxxxx> > > > > The last few patches have introduced a new preliminary step when rename > > detection is on but both break detection and copy detection are off. > > Document this new step. While we're at it, add a testcase that checks > > the new behavior as well. > > > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > > --- > > Documentation/gitdiffcore.txt | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt > > index c970d9fe438a..36ebe364d874 100644 > > --- a/Documentation/gitdiffcore.txt > > +++ b/Documentation/gitdiffcore.txt > > @@ -168,6 +168,23 @@ a similarity score different from the default of 50% by giving a > > number after the "-M" or "-C" option (e.g. "-M8" to tell it to use > > 8/10 = 80%). > > > > +Note that when rename detection is on but both copy and break > > +detection are off, rename detection adds a preliminary step that first > > +checks if files are moved across directories while keeping their > > +filename the same. If there is a file added to a directory whose > > +contents is sufficiently similar to a file with the same name that got > > +deleted from a different directory, it will mark them as renames and > > +exclude them from the later quadratic step (the one that pairwise > > +compares all unmatched files to find the "best" matches, determined by > > +the highest content similarity). So, for example, if > > +docs/extensions.txt and docs/config/extensions.txt have similar > > +content, then they will be marked as a rename even if it turns out > > +that docs/extensions.txt was more similar to src/extension-checks.c. > > I'd rather use docs/extensions.md instead of src/extension-checks.c; > it would be more realistic for .md to be similar to .txt than .c. > > With a raised bar for this step, the equation changes a bit, no? > > So, for example, if a deleted docs/ext.txt and an added > docs/config/ext.txt are similar enough, they will be marked as a > rename and prevent an added docs/ext.md that may be even similar > to the deleted docs/ext.txt from being considered as the rename > destination in the later step. For this reason, the preliminary > "match same filename" step uses a bit higher threshold to mark a > file pair as a rename and stop considering other candidates for > better matches. > > or something? Good points; I've updated the docs locally to reflect your suggestions, I'll wait a bit for any other feedback and then send out a new round with this update. > > +At most, one comparison is done per file in this preliminary pass; so > > +if there are several extensions.txt files throughout the directory > > +hierarchy that were added and deleted, this preliminary step will be > > +skipped for those files. > > Other than that, the whole series looked sensible to my cursory > read. Thanks.