"Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Elijah Newren <newren@xxxxxxxxx> > > The last few patches have introduced a new preliminary step when rename > detection is on but both break detection and copy detection are off. > Document this new step. While we're at it, add a testcase that checks > the new behavior as well. > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > --- > Documentation/gitdiffcore.txt | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt > index c970d9fe438a..36ebe364d874 100644 > --- a/Documentation/gitdiffcore.txt > +++ b/Documentation/gitdiffcore.txt > @@ -168,6 +168,23 @@ a similarity score different from the default of 50% by giving a > number after the "-M" or "-C" option (e.g. "-M8" to tell it to use > 8/10 = 80%). > > +Note that when rename detection is on but both copy and break > +detection are off, rename detection adds a preliminary step that first > +checks if files are moved across directories while keeping their > +filename the same. If there is a file added to a directory whose > +contents is sufficiently similar to a file with the same name that got > +deleted from a different directory, it will mark them as renames and > +exclude them from the later quadratic step (the one that pairwise > +compares all unmatched files to find the "best" matches, determined by > +the highest content similarity). So, for example, if > +docs/extensions.txt and docs/config/extensions.txt have similar > +content, then they will be marked as a rename even if it turns out > +that docs/extensions.txt was more similar to src/extension-checks.c. I'd rather use docs/extensions.md instead of src/extension-checks.c; it would be more realistic for .md to be similar to .txt than .c. With a raised bar for this step, the equation changes a bit, no? So, for example, if a deleted docs/ext.txt and an added docs/config/ext.txt are similar enough, they will be marked as a rename and prevent an added docs/ext.md that may be even similar to the deleted docs/ext.txt from being considered as the rename destination in the later step. For this reason, the preliminary "match same filename" step uses a bit higher threshold to mark a file pair as a rename and stop considering other candidates for better matches. or something? > +At most, one comparison is done per file in this preliminary pass; so > +if there are several extensions.txt files throughout the directory > +hierarchy that were added and deleted, this preliminary step will be > +skipped for those files. Other than that, the whole series looked sensible to my cursory read. Thanks.