"Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > +MAYBE_UNUSED > +static int find_basename_matches(struct diff_options *options, > + int minimum_score, > + int num_src) > +{ > + int i; > + struct strintmap sources; > + struct strintmap dests; > + > + /* Create maps of basename -> fullname(s) for sources and dests */ > + strintmap_init_with_options(&sources, -1, NULL, 0); > + strintmap_init_with_options(&dests, -1, NULL, 0); > + for (i = 0; i < num_src; ++i) { > + char *filename = rename_src[i].p->one->path; > + const char *base; > + > + /* exact renames removed in remove_unneeded_paths_from_src() */ > + assert(!rename_src[i].p->one->rename_used); > + > + /* Record index within rename_src (i) if basename is unique */ > + base = get_basename(filename); > + if (strintmap_contains(&sources, base)) > + strintmap_set(&sources, base, -1); > + else > + strintmap_set(&sources, base, i); > + } > + for (i = 0; i < rename_dst_nr; ++i) { > + char *filename = rename_dst[i].p->two->path; > + const char *base; > + > + if (rename_dst[i].is_rename) > + continue; /* involved in exact match already. */ > + > + /* Record index within rename_dst (i) if basename is unique */ > + base = get_basename(filename); > + if (strintmap_contains(&dests, base)) > + strintmap_set(&dests, base, -1); > + else > + strintmap_set(&dests, base, i); > + } > + > + /* TODO: Make use of basenames source and destination basenames */ ;-) So at this point sources and dests can be used to quickly look up, given a filename, if there is a single src among all sources, and a single dst among all dests, that have the filename. I wonder if the second loop over destinations can be "optimized" further by using the sources map, though. The reason you quash entries with -1 when you see second instance of the same name is because you intend to limit the heuristics only to a uniquely named file among the removed files going to a uniquely named file among the added files, right? So even if a name is unique among dests, if that name has duplicates on the source side, there is no point recording its location. i.e. /* record index within dst if it is unique in both dst and src */ base = get_basename(filename); if (strintmap_contains(&sources, base) || strintmap_contains(&dests, base)) strintmap_set(&dests, base, -1); else strintmap_set(&dests, base, i); perhaps? I guess it depends on what actually will be written in this "TODO" space how effective such a change would be. Presumably, you'd iterate over &sources while skipping entries that record -1, to learn (basename, i), and use the basename found there to consult &dests to see if it yields a non-negative integer j, to notice that rename_src[i] is a good candidate to match rename_dst[j]. If that is the case, then such a change won't help as an optimization at all, as we'd need to consult &dests map with the basename anyway, so let's scratch the above idea. In any case, after we walk over rename_src[] and rename_dst[] once, the number of entries in &sources would be smaller than rename_src[] so iterating over &sources, hunting for entries that record non-negative index into rename_src[] would hopefully be cheaper than the naive loop we've been working with. I like the idea of using the strintmap for this part of the code. Thanks. > + strintmap_clear(&sources); > + strintmap_clear(&dests); > + > + return 0; > +} > + > #define NUM_CANDIDATE_PER_DST 4 > static void record_if_better(struct diff_score m[], struct diff_score *o) > {