Re: [PATCH 6/7] diffcore-rename: simplify and accelerate register_rename_src()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 9, 2020 at 2:40 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Sun, Dec 06, 2020 at 02:54:35AM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@xxxxxxxxx>
> >
> > reigster_rename_src() took pains to create an array in rename_src which
> > was sorted by pathname of the contained diff_filepair.  However, the
> > fact that this array was sorted was not needed anywhere, and thus
> > represented wasted time.  Simply append to the end of the array, which
> > in a usecase of note saved 45% of diffcore_rename() setup time for me.
> >
> > Technically, there is one difference in the end result for the code.  IF
>
> s/IF/If ?

Indeed; will fix.

> > the caller of diffcore-rename provides a set of pairs with a duplicate
> > filename in the sources (an erroneous input condition), the old code
> > would simply ignore the duplicate (without warning or error).  The new
> > code will simply swallow it and thus allow multiple pairings for the
> > "same" source file.  Checking for this error condition is expensive
> > (basically undoing the optimization) and the only existing callers
> > already avoid passing such an invalid input.  If someone really wants to
> > add some error checking here on this input condition, I believe they
> > should add a separate function which can be called before
> > diffcore_rename(), and then other callers that don't need additional
> > checks because of their design (such as merge-ort) can avoid the extra
> > overhead.
>
> It seems like this is currently impossible to trigger, making any extra
> (expensive) checking of it worthless, no?

I believe that's what it currently amounts to, and I debated just
ripping the paragraph out.

However, a natural question that can easily arise for current
reviewers or future readers of the patch is why was there ever sorting
in the first place if the sorting isn't used?  That question came up
for me, and I dug into it.  Sorting is also used with rename_dst in
nearby code in the file, and there are a few reasons for it there.
The quick indexing that rename_dst needs doesn't apply to rename_src,
but it's not as obvious whether the
broken-trees-with-duplicate-entries rationale applies or not.  I spent
a while digging into it.  (And it's possible that I didn't correctly
check the callers or that in the seven months since I wrote this
message someone added another caller of this code that does pass
multiple diff_filepair-s for the "same" source file.)  Anyway, this
paragraph exists because I had to go down a goose chase to answer this
natural question, and I wanted to provide an answer to anyone else
asking the same question.  Also, in the off chance that anyone did
want to add callers that passed multiple copies of any source file, I
wanted to point out that this modified algorithm would result in a
slight behavioral difference, but that otherwise the modified
algorithm gives identical results.  (And if some future reader
stumbled on the paragraph because they had made such a change, I
wanted to provide a quick suggestion of how to get what they wanted
without adversely affecting performance.)

I hope that helps.  I'm sorry if my worrying about these cases and
discussing them made the patch harder to read or review, but I feel
like diffcore-rename is one of those low-level components you want to
be careful with.  diffcore-rename is one of those parts of the code
where a bug in it might not result in a directly observable breakage
to users but in some secondary or tertiary side-effect showing weird
results (e.g. in a merge you won't necessarily see that A was renamed
to B, instead you get a three-way content merge of original A, other
A, and new B -- or don't get a three-way content merge you might
expect).

> > Also, note that I dropped the return type on the function since it was
> > unconditionally discarded anyway.
> >
> > This patch is being submitted in a different order than its original
> > development, but in a large rebase of many commits with lots of renames
> > and with several optimizations to inexact rename detection,
> > diffcore_rename() setup time was a sizeable chunk of overall runtime.
> > This patch dropped execution time of rebasing 35 commits with lots of
> > renames by 2% overall.
>
> Neat!
>
> > Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
> > ---
> >  diffcore-rename.c | 30 +++---------------------------
> >  1 file changed, 3 insertions(+), 27 deletions(-)
> >
> > diff --git a/diffcore-rename.c b/diffcore-rename.c
> > index 3d637ba4645..816d2fbac44 100644
> > --- a/diffcore-rename.c
> > +++ b/diffcore-rename.c
> > @@ -76,36 +76,12 @@ static struct diff_rename_src {
> > [...]
>
> This looks obviously correct.

Thanks for taking a look!



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux