On Tue, Feb 9, 2021 at 5:33 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 2/9/2021 6:32 AM, Elijah Newren via GitGitGadget wrote: > > + num_sources = rename_src_nr; > > + > > + if (want_copies || break_idx) { > > + /* > > + * Cull sources: > > + * - remove ones corresponding to exact renames > > + */ > > + trace2_region_enter("diff", "cull after exact", options->repo); > > + remove_unneeded_paths_from_src(want_copies); > > + trace2_region_leave("diff", "cull after exact", options->repo); > > Isn't this the same as > > > + } else { > > + /* Determine minimum score to match basenames */ > > + int min_basename_score = (int)(5*minimum_score + 0*MAX_SCORE)/5; > > + > > + /* > > + * Cull sources: > > + * - remove ones involved in renames (found via exact match) > > + */ > > + trace2_region_enter("diff", "cull exact", options->repo); > > + remove_unneeded_paths_from_src(want_copies); > > + trace2_region_leave("diff", "cull exact", options->repo); > > ...this? (except the regions are renamed) > > Could this be simplified as: > > + num_sources = rename_src_nr; > + > + trace2_region_enter("diff", "cull after exact", options->repo); > + remove_unneeded_paths_from_src(want_copies); > + trace2_region_leave("diff", "cull after exact", options->repo); > + > + if (!want_copies && !break_idx) { > + /* Determine minimum score to match basenames */ > > I realize you probably changed the region names on purpose to distinguish > that there are two "cull" regions in the case of no copies, but I think > that isn't really worth different names. Better to have a consistent region > name around the same activity in both cases. Actually, the reason they were split is because a later series has to call remove_unneeded_paths_from_src() differently for the two branches. The patch history was so dirty that the easiest way to clean things up was just to create completely new patches pulling off relevant chunks of code and touching them up; while doing that, I didn't notice that the changes I made to split out this early series resulted in this near-duplication. So, I can join them...but they would just need to be split back out in my "Optimization batch 9" series. I'm happy to fix the region name to make them the same. Is that good enough, or would you rather these common code regions combined for this patch and then split out later? > > > + int min_basename_score = (int)(5*minimum_score + 0*MAX_SCORE)/5; > > Did you intend for this to be 5*min + 0*MAX? This seems wrong if you want > this value to be different from minimum_score. In my cover letter I noted that I didn't know what to set this to and wanted input; yesterday you said it wasn't worth worrying about using a different value, but Junio suggested we should use one (but didn't state how much higher it should be or whether it should be user input driven). This weird construct was here just to show that it is easy to feed a different score into the basename comparison than what is used elsewhere; I can fix it up once I get word on what Junio wants to see. Since I didn't know what to use, though, and I didn't want to get a different set of numbers for the final commit message on the speedup achieved if I'm just going to throw them away and recompute once I find out what Junio wants here, I did intentionally set the computation to just give us minimum_score, for now. > > + > > + /* Utilize file basenames to quickly find renames. */ > > + trace2_region_enter("diff", "basename matches", options->repo); > > + rename_count += find_basename_matches(options, > > + min_basename_score, > > + rename_src_nr); > > + trace2_region_leave("diff", "basename matches", options->repo); > > + > > + /* > > + * Cull sources, again: > > + * - remove ones involved in renames (found via basenames) > > + */ > > + trace2_region_enter("diff", "cull basename", options->repo); > > + remove_unneeded_paths_from_src(want_copies); > > + trace2_region_leave("diff", "cull basename", options->repo); > > + } > > + > > Thanks, > -Stolee