Re: [PATCH v1] diffcore-rename: speed up register_rename_src

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Thu, 20 Apr 2017 14:08:46 -0400

On 4/20/2017 12:13 PM, Jeff King wrote:
On Thu, Apr 20, 2017 at 10:00:04AM -0400, Jeff Hostetler wrote:

Perhaps the thing to learn from this (and the other ones) is that
we have lots of places where we are building a sorted list by
iterating over a sorted list.  The insert routines are general
purpose and cannot assume this, so they search first.  Perhaps it
would be clearer to have independent _append and _insert functions
and have the caller explicitly call the appropriate one.  The mainline
iterations on the existing index could just call the _append form
and never have to worry about searching or the negative-integer
return trick.  Whereas, the random iterations (such as on the
command's arg list), would always call the _insert form.

Yes. I'd be much happier if your patch was flipping between two
general-purpose insertion functions. And if that same trick was used on
the dst side.

Or even, given that this these functions are called from a single
location that has sorted input, the binary search was just replaced
completely with an append combined with a sort-check.

That's not the minimal change you were going for, but I think the end
result is simpler and more consistent.

OK, let me take a stab at something like that and
see where it takes me.

WRT your earlier comment about how often we add or delete 4M
files and then run status.  The use case that started this was a
1% sparse-checkout followed by a read-tree (which reset the
skip-worktree bits) and then status (which thought 99% of the
worktree had been deleted or maybe renamed).  There are probably
other ways to get into this state, but that's how this started.
The more subtle point is that -- for these obscenely large
values of n -- any time I see an O(n log n) operation that could
or should be O(n), I want to stop and look at it.