This series depends on ort-perf-batch-6[1]. This series uses file basenames (portion of the path after final '/', including extension) in a basic fashion to guide rename detection. Changes since v3: * update documentation as suggested by Junio * NEW: add another patch at the end, to simplify patch series that will be submitted later (please review!) [1] https://lore.kernel.org/git/xmqqlfc4byt6.fsf@xxxxxxxxxxxxxxxxxxxxxx/ Elijah Newren (6): t4001: add a test comparing basename similarity and content similarity diffcore-rename: compute basenames of all source and dest candidates diffcore-rename: complete find_basename_matches() diffcore-rename: guide inexact rename detection based on basenames gitdiffcore doc: mention new preliminary step for rename detection merge-ort: call diffcore_rename() directly Documentation/gitdiffcore.txt | 20 ++++ diffcore-rename.c | 202 +++++++++++++++++++++++++++++++++- merge-ort.c | 66 +++++++++-- t/t4001-diff-rename.sh | 24 ++++ 4 files changed, 301 insertions(+), 11 deletions(-) base-commit: 7ae9460d3dba84122c2674b46e4339b9d42bdedd Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-843%2Fnewren%2Fort-perf-batch-7-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-843/newren/ort-perf-batch-7-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/843 Range-diff vs v3: 1: 3e6af929d135 = 1: 3e6af929d135 t4001: add a test comparing basename similarity and content similarity 2: 4fff9b1ff57b = 2: 4fff9b1ff57b diffcore-rename: compute basenames of all source and dest candidates 3: dc26881e4ed3 = 3: dc26881e4ed3 diffcore-rename: complete find_basename_matches() 4: 2493f4b2f55d = 4: 2493f4b2f55d diffcore-rename: guide inexact rename detection based on basenames 5: fc72d24a3358 ! 5: 4e86ed3f29d4 gitdiffcore doc: mention new preliminary step for rename detection @@ Documentation/gitdiffcore.txt: a similarity score different from the default of +deleted from a different directory, it will mark them as renames and +exclude them from the later quadratic step (the one that pairwise +compares all unmatched files to find the "best" matches, determined by -+the highest content similarity). So, for example, if -+docs/extensions.txt and docs/config/extensions.txt have similar -+content, then they will be marked as a rename even if it turns out -+that docs/extensions.txt was more similar to src/extension-checks.c. -+At most, one comparison is done per file in this preliminary pass; so -+if there are several extensions.txt files throughout the directory -+hierarchy that were added and deleted, this preliminary step will be -+skipped for those files. ++the highest content similarity). So, for example, if a deleted ++docs/ext.txt and an added docs/config/ext.txt are similar enough, they ++will be marked as a rename and prevent an added docs/ext.md that may ++be even more similar to the deleted docs/ext.txt from being considered ++as the rename destination in the later step. For this reason, the ++preliminary "match same filename" step uses a bit higher threshold to ++mark a file pair as a rename and stop considering other candidates for ++better matches. At most, one comparison is done per file in this ++preliminary pass; so if there are several ext.txt files throughout the ++directory hierarchy that were added and deleted, this preliminary step ++will be skipped for those files. + Note. When the "-C" option is used with `--find-copies-harder` option, 'git diff-{asterisk}' commands feed unmodified filepairs to -: ------------ > 6: fedb3d323d94 merge-ort: call diffcore_rename() directly -- gitgitgadget