Re: [PATCH v6 0/3] read-cache: speed up add_index_entry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 07, 2017 at 02:27:24PM -0400, Jeff Hostetler wrote:

> > Just thinking about this algorithmically for a moment. You're saving the
> > binary search when the input is given in sorted order. But in other
> > cases you're adding an extra strcmp() before the binary search begins.
> > So it's a tradeoff.
> > 
> > How often is the input sorted?  You save O(log n) strcmps for a "hit"
> > with your patch, and one for a "miss". So it's a net win if we expect at
> > least 1/log(n) of additions to be sorted (I'm talking about individual
> > calls, but it should scale linearly either way over a set of n calls).
> > 
> > I have no clue if that's a reasonable assumption or not.
> 
> I was seeing checkout call merge_working_tree to iterate over the
> source index/trees and call add_index_entry() for each.  For example,
> in a "checkout -b" like operation where both sides are the same, this
> calls keep_entry() which appends the entry to the new index array.
> The append path should always be taken because the iteration is being
> driven from a sorted list.
> 
> I would think calls to add/stage individual files arrive in random
> order, so I'm not suggesting replacing the code -- just checking the
> end first.

Right, what I was wondering is how much this costs in those random-order
cases. We _know_ it speeds up the cases you care about, but I want to
make sure that it is not making some other case worse. How often do the
random-order cases come up, and how much are they slowed?

I suspect in practice that calls here fall into one of two camps:
feeding a small-ish (compared to the total number of entries) set of
paths, or feeding _every_ path. And if you are feeding every path, you
are likely to do so in sorted order, rather than a random jumble. So it
helps in the big cases, and the small cases are presumably small enough
that we don't care much.

At least that seems like a plausible line of reasoning to me. ;)

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]