From: Elijah Newren <newren@xxxxxxxxx> We want to make use of unique basenames to help inform rename detection, so that more likely pairings can be checked first. (src/moduleA/foo.txt and source/module/A/foo.txt are likely related if there are no other 'foo.txt' files among the deleted and added files.) Add a new function, not yet used, which creates a map of the unique basenames within rename_src and another within rename_dst, together with the indices within rename_src/rename_dst where those basenames show up. Non-unique basenames still show up in the map, but have an invalid index (-1). This function was inspired by the fact that in real world repositories, most renames often do not involve a basename change. Here are some sample repositories and the percentage of their historical renames (as of early 2020) that did not involve a basename change: * linux: 76% * gcc: 64% * gecko: 79% * webkit: 89% Signed-off-by: Elijah Newren <newren@xxxxxxxxx> --- diffcore-rename.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index 74930716e70d..1c52077b04e5 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -367,6 +367,59 @@ static int find_exact_renames(struct diff_options *options) return renames; } +MAYBE_UNUSED +static int find_basename_matches(struct diff_options *options, + int minimum_score, + int num_src) +{ + int i; + struct strintmap sources; + struct strintmap dests; + + /* Create maps of basename -> fullname(s) for sources and dests */ + strintmap_init_with_options(&sources, -1, NULL, 0); + strintmap_init_with_options(&dests, -1, NULL, 0); + for (i = 0; i < num_src; ++i) { + char *filename = rename_src[i].p->one->path; + char *base; + + /* exact renames removed in remove_unneeded_paths_from_src() */ + assert(!rename_src[i].p->one->rename_used); + + base = strrchr(filename, '/'); + base = (base ? base+1 : filename); + + /* Record index within rename_src (i) if basename is unique */ + if (strintmap_contains(&sources, base)) + strintmap_set(&sources, base, -1); + else + strintmap_set(&sources, base, i); + } + for (i = 0; i < rename_dst_nr; ++i) { + char *filename = rename_dst[i].p->two->path; + char *base; + + if (rename_dst[i].is_rename) + continue; /* involved in exact match already. */ + + base = strrchr(filename, '/'); + base = (base ? base+1 : filename); + + /* Record index within rename_dst (i) if basename is unique */ + if (strintmap_contains(&dests, base)) + strintmap_set(&dests, base, -1); + else + strintmap_set(&dests, base, i); + } + + /* TODO: Make use of basenames source and destination basenames */ + + strintmap_clear(&sources); + strintmap_clear(&dests); + + return 0; +} + #define NUM_CANDIDATE_PER_DST 4 static void record_if_better(struct diff_score m[], struct diff_score *o) { -- gitgitgadget