This series depends on en/diffcore-rename (a concatenation of what I was calling ort-perf-batch-6 and ort-perf-batch-7). Changes since v3: * Update the commit messages (one was out of date after the rearrangement), and include Stolee's Reviewed-by Elijah Newren (10): diffcore-rename: use directory rename guided basename comparisons diffcore-rename: provide basic implementation of idx_possible_rename() diffcore-rename: add a mapping of destination names to their indices Move computation of dir_rename_count from merge-ort to diffcore-rename diffcore-rename: add function for clearing dir_rename_count diffcore-rename: move dir_rename_counts into dir_rename_info struct diffcore-rename: extend cleanup_dir_rename_info() diffcore-rename: compute dir_rename_counts in stages diffcore-rename: limit dir_rename_counts computation to relevant dirs diffcore-rename: compute dir_rename_guess from dir_rename_counts Documentation/gitdiffcore.txt | 2 +- diffcore-rename.c | 449 ++++++++++++++++++++++++++++++++-- diffcore.h | 7 + merge-ort.c | 144 +---------- 4 files changed, 449 insertions(+), 153 deletions(-) base-commit: aeca14f748afc7fb5b65bca56ea2ebd970729814 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-844%2Fnewren%2Fort-perf-batch-8-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-844/newren/ort-perf-batch-8-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/844 Range-diff vs v3: 1: 6afa9add40b9 ! 1: 823d07532e00 diffcore-rename: use directory rename guided basename comparisons @@ Commit message min_basename_score threshold required for marking the two files as renames. - This commit introduces an idx_possible_rename() function which will give + This commit introduces an idx_possible_rename() function which will do this directory rename detection for us and give us the index within rename_dst of the resulting filename. For now, this function is hardcoded to return -1 (not found) and just hooks up how its results would be used once we have a more complete implementation in place. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## Documentation/gitdiffcore.txt ## 2: 40f57bcc2055 ! 2: 2dde621d7de5 diffcore-rename: add a new idx_possible_rename function @@ Metadata Author: Elijah Newren <newren@xxxxxxxxx> ## Commit message ## - diffcore-rename: add a new idx_possible_rename function + diffcore-rename: provide basic implementation of idx_possible_rename() - find_basename_matches() is great when both the remaining set of possible - rename sources and the remaining set of possible rename destinations - have exactly one file each with a given basename. It allows us to match - up files that have been moved to different directories without changing - filenames. + Add a new struct dir_rename_info with various values we need inside our + idx_possible_rename() function introduced in the previous commit. Add a + basic implementation for this function showing how we plan to use the + variables, but which will just return early with a value of -1 (not + found) when those variables are not set up. - When basenames are not unique, though, we want to be able to guess which - directories the source files have been moved to. Since this is the job - of directory rename detection, we employ it. However, since it is a - directory rename detection idea, we also limit it to cases where we know - there could have been a directory rename, i.e. where the source - directory has been removed. This has to be signalled by dirs_removed - being non-NULL and containing an entry for the relevant directory. - Since merge-ort.c is the only caller that currently does so, this - optimization is only effective for merge-ort right now. In the future, - this condition could be reconsidered or we could modify other callers to - pass the necessary strset. - - Anyway, that's a lot of background so that we can actually describe the - new function. Add an idx_possible_rename() function which combines the - recently added dir_rename_guess and idx_map fields to provide the index - within rename_dst of a potential match for a given file. - - Future commits will add checks after calling this function to compare - the resulting 'likely rename' candidates to see if the two files meet - the elevated min_basename_score threshold for marking them as actual - renames. + Future commits will do the work necessary to set up those other + variables so that idx_possible_rename() does not always return -1. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## 3: 0e14961574ea ! 3: 21b9cf1da30e diffcore-rename: add a mapping of destination names to their indices @@ Commit message dir_rename_guess; these will be more fully populated in subsequent commits. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## 4: 9b9d5b207b03 ! 4: 3617b0209cc4 Move computation of dir_rename_count from merge-ort to diffcore-rename @@ Commit message preliminary computation of dir_rename_count after exact rename detection, followed by some updates after inexact rename detection. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## 5: f286e89464ea ! 5: 2baf39d82f3e diffcore-rename: add function for clearing dir_rename_count @@ Commit message for clearing, or partially clearing it out. Add a partial_clear_dir_rename_count() function for this purpose. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## 6: ab353f2e75eb ! 6: 02f1f7c02d32 diffcore-rename: move dir_rename_counts into dir_rename_info struct @@ Commit message dir_rename_info struct. Future commits will then make dir_rename_counts be computed in stages, and add computation of dir_rename_guess. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## 7: bd50d9e53804 ! 7: 9c3436840534 diffcore-rename: extend cleanup_dir_rename_info() @@ Commit message Extend cleanup_dir_rename_info() to handle these two different cases, cleaning up the relevant bits of information for each case. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## 8: 44cfae6505f2 ! 8: 6bd398d3707e diffcore-rename: compute dir_rename_counts in stages @@ Commit message augment the counts via calling update_dir_rename_counts() after each basename-guide and inexact rename detection match is found. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## 9: 752aff3a7995 ! 9: 46304aaebf5a diffcore-rename: limit dir_rename_counts computation to relevant dirs @@ Commit message info->relevant_source_dirs variable for this purpose, even though at this stage we will only set it to dirs_removed for simplicity. + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## 10: 65f7bfb735f2 ! 10: 4be565c47208 diffcore-rename: compute dir_rename_guess from dir_rename_counts @@ Commit message mega-renames: 188.754 s ± 0.284 s 130.465 s ± 0.259 s just-one-mega: 5.599 s ± 0.019 s 3.958 s ± 0.010 s + Reviewed-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Elijah Newren <newren@xxxxxxxxx> ## diffcore-rename.c ## -- gitgitgadget