On Fri, Apr 07, 2017 at 02:27:24PM -0400, Jeff Hostetler wrote: > > Just thinking about this algorithmically for a moment. You're saving the > > binary search when the input is given in sorted order. But in other > > cases you're adding an extra strcmp() before the binary search begins. > > So it's a tradeoff. > > > > How often is the input sorted? You save O(log n) strcmps for a "hit" > > with your patch, and one for a "miss". So it's a net win if we expect at > > least 1/log(n) of additions to be sorted (I'm talking about individual > > calls, but it should scale linearly either way over a set of n calls). > > > > I have no clue if that's a reasonable assumption or not. > > I was seeing checkout call merge_working_tree to iterate over the > source index/trees and call add_index_entry() for each. For example, > in a "checkout -b" like operation where both sides are the same, this > calls keep_entry() which appends the entry to the new index array. > The append path should always be taken because the iteration is being > driven from a sorted list. > > I would think calls to add/stage individual files arrive in random > order, so I'm not suggesting replacing the code -- just checking the > end first. Right, what I was wondering is how much this costs in those random-order cases. We _know_ it speeds up the cases you care about, but I want to make sure that it is not making some other case worse. How often do the random-order cases come up, and how much are they slowed? I suspect in practice that calls here fall into one of two camps: feeding a small-ish (compared to the total number of entries) set of paths, or feeding _every_ path. And if you are feeding every path, you are likely to do so in sorted order, rather than a random jumble. So it helps in the big cases, and the small cases are presumably small enough that we don't care much. At least that seems like a plausible line of reasoning to me. ;) -Peff