Re: [PATCH v2 3/4] diffcore-rename: guide inexact rename detection based on basenames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 9, 2021 at 5:33 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:
>
> On 2/9/2021 6:32 AM, Elijah Newren via GitGitGadget wrote:
> > +     num_sources = rename_src_nr;
> > +
> > +     if (want_copies || break_idx) {
> > +             /*
> > +              * Cull sources:
> > +              *   - remove ones corresponding to exact renames
> > +              */
> > +             trace2_region_enter("diff", "cull after exact", options->repo);
> > +             remove_unneeded_paths_from_src(want_copies);
> > +             trace2_region_leave("diff", "cull after exact", options->repo);
>
> Isn't this the same as
>
> > +     } else {
> > +             /* Determine minimum score to match basenames */
> > +             int min_basename_score = (int)(5*minimum_score + 0*MAX_SCORE)/5;
> > +
> > +             /*
> > +              * Cull sources:
> > +              *   - remove ones involved in renames (found via exact match)
> > +              */
> > +             trace2_region_enter("diff", "cull exact", options->repo);
> > +             remove_unneeded_paths_from_src(want_copies);
> > +             trace2_region_leave("diff", "cull exact", options->repo);
>
> ...this? (except the regions are renamed)
>
> Could this be simplified as:
>
> +       num_sources = rename_src_nr;
> +
> +       trace2_region_enter("diff", "cull after exact", options->repo);
> +       remove_unneeded_paths_from_src(want_copies);
> +       trace2_region_leave("diff", "cull after exact", options->repo);
> +
> +       if (!want_copies && !break_idx) {
> +               /* Determine minimum score to match basenames */
>
> I realize you probably changed the region names on purpose to distinguish
> that there are two "cull" regions in the case of no copies, but I think
> that isn't really worth different names. Better to have a consistent region
> name around the same activity in both cases.

Actually, the reason they were split is because a later series has to
call remove_unneeded_paths_from_src() differently for the two
branches.  The patch history was so dirty that the easiest way to
clean things up was just to create completely new patches pulling off
relevant chunks of code and touching them up; while doing that, I
didn't notice that the changes I made to split out this early series
resulted in this near-duplication.

So, I can join them...but they would just need to be split back out in
my "Optimization batch 9" series.

I'm happy to fix the region name to make them the same.  Is that good
enough, or would you rather these common code regions combined for
this patch and then split out later?

>
> > +             int min_basename_score = (int)(5*minimum_score + 0*MAX_SCORE)/5;
>
> Did you intend for this to be 5*min + 0*MAX? This seems wrong if you want
> this value to be different from minimum_score.

In my cover letter I noted that I didn't know what to set this to and
wanted input; yesterday you said it wasn't worth worrying about using
a different value, but Junio suggested we should use one (but didn't
state how much higher it should be or whether it should be user input
driven).  This weird construct was here just to show that it is easy
to feed a different score into the basename comparison than what is
used elsewhere; I can fix it up once I get word on what Junio wants to
see.

Since I didn't know what to use, though, and I didn't want to get a
different set of numbers for the final commit message on the speedup
achieved if I'm just going to throw them away and recompute once I
find out what Junio wants here, I did intentionally set the
computation to just give us minimum_score, for now.

> > +
> > +             /* Utilize file basenames to quickly find renames. */
> > +             trace2_region_enter("diff", "basename matches", options->repo);
> > +             rename_count += find_basename_matches(options,
> > +                                                   min_basename_score,
> > +                                                   rename_src_nr);
> > +             trace2_region_leave("diff", "basename matches", options->repo);
> > +
> > +             /*
> > +              * Cull sources, again:
> > +              *   - remove ones involved in renames (found via basenames)
> > +              */
> > +             trace2_region_enter("diff", "cull basename", options->repo);
> > +             remove_unneeded_paths_from_src(want_copies);
> > +             trace2_region_leave("diff", "cull basename", options->repo);
> > +     }
> > +
>
> Thanks,
> -Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux