On 4/7/2017 12:46 AM, Jeff King wrote:
On Thu, Apr 06, 2017 at 04:34:39PM +0000, git@xxxxxxxxxxxxxxxxx wrote:
Teach add_index_entry_with_check() and has_dir_name()
to avoid index lookups if the given path sorts after
the last entry in the index.
This saves at least 2 binary searches per entry.
This improves performance during checkout and read-tree because
merge_working_tree() and unpack_trees() processes a list of already
sorted entries.
Just thinking about this algorithmically for a moment. You're saving the
binary search when the input is given in sorted order. But in other
cases you're adding an extra strcmp() before the binary search begins.
So it's a tradeoff.
How often is the input sorted? You save O(log n) strcmps for a "hit"
with your patch, and one for a "miss". So it's a net win if we expect at
least 1/log(n) of additions to be sorted (I'm talking about individual
calls, but it should scale linearly either way over a set of n calls).
I have no clue if that's a reasonable assumption or not.
I was seeing checkout call merge_working_tree to iterate over the
source index/trees and call add_index_entry() for each. For example,
in a "checkout -b" like operation where both sides are the same, this
calls keep_entry() which appends the entry to the new index array.
The append path should always be taken because the iteration is being
driven from a sorted list.
I would think calls to add/stage individual files arrive in random
order, so I'm not suggesting replacing the code -- just checking the
end first.
Jeff