On Wed, Dec 9, 2020 at 2:40 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > On Sun, Dec 06, 2020 at 02:54:35AM +0000, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@xxxxxxxxx> > > > > reigster_rename_src() took pains to create an array in rename_src which > > was sorted by pathname of the contained diff_filepair. However, the > > fact that this array was sorted was not needed anywhere, and thus > > represented wasted time. Simply append to the end of the array, which > > in a usecase of note saved 45% of diffcore_rename() setup time for me. > > > > Technically, there is one difference in the end result for the code. IF > > s/IF/If ? Indeed; will fix. > > the caller of diffcore-rename provides a set of pairs with a duplicate > > filename in the sources (an erroneous input condition), the old code > > would simply ignore the duplicate (without warning or error). The new > > code will simply swallow it and thus allow multiple pairings for the > > "same" source file. Checking for this error condition is expensive > > (basically undoing the optimization) and the only existing callers > > already avoid passing such an invalid input. If someone really wants to > > add some error checking here on this input condition, I believe they > > should add a separate function which can be called before > > diffcore_rename(), and then other callers that don't need additional > > checks because of their design (such as merge-ort) can avoid the extra > > overhead. > > It seems like this is currently impossible to trigger, making any extra > (expensive) checking of it worthless, no? I believe that's what it currently amounts to, and I debated just ripping the paragraph out. However, a natural question that can easily arise for current reviewers or future readers of the patch is why was there ever sorting in the first place if the sorting isn't used? That question came up for me, and I dug into it. Sorting is also used with rename_dst in nearby code in the file, and there are a few reasons for it there. The quick indexing that rename_dst needs doesn't apply to rename_src, but it's not as obvious whether the broken-trees-with-duplicate-entries rationale applies or not. I spent a while digging into it. (And it's possible that I didn't correctly check the callers or that in the seven months since I wrote this message someone added another caller of this code that does pass multiple diff_filepair-s for the "same" source file.) Anyway, this paragraph exists because I had to go down a goose chase to answer this natural question, and I wanted to provide an answer to anyone else asking the same question. Also, in the off chance that anyone did want to add callers that passed multiple copies of any source file, I wanted to point out that this modified algorithm would result in a slight behavioral difference, but that otherwise the modified algorithm gives identical results. (And if some future reader stumbled on the paragraph because they had made such a change, I wanted to provide a quick suggestion of how to get what they wanted without adversely affecting performance.) I hope that helps. I'm sorry if my worrying about these cases and discussing them made the patch harder to read or review, but I feel like diffcore-rename is one of those low-level components you want to be careful with. diffcore-rename is one of those parts of the code where a bug in it might not result in a directly observable breakage to users but in some secondary or tertiary side-effect showing weird results (e.g. in a merge you won't necessarily see that A was renamed to B, instead you get a three-way content merge of original A, other A, and new B -- or don't get a three-way content merge you might expect). > > Also, note that I dropped the return type on the function since it was > > unconditionally discarded anyway. > > > > This patch is being submitted in a different order than its original > > development, but in a large rebase of many commits with lots of renames > > and with several optimizations to inexact rename detection, > > diffcore_rename() setup time was a sizeable chunk of overall runtime. > > This patch dropped execution time of rebasing 35 commits with lots of > > renames by 2% overall. > > Neat! > > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > > --- > > diffcore-rename.c | 30 +++--------------------------- > > 1 file changed, 3 insertions(+), 27 deletions(-) > > > > diff --git a/diffcore-rename.c b/diffcore-rename.c > > index 3d637ba4645..816d2fbac44 100644 > > --- a/diffcore-rename.c > > +++ b/diffcore-rename.c > > @@ -76,36 +76,12 @@ static struct diff_rename_src { > > [...] > > This looks obviously correct. Thanks for taking a look!