On Thu, 2016-01-21 at 16:30 -0500, Jeff King wrote: > On Thu, Jan 21, 2016 at 04:11:48PM -0500, David Turner wrote: > > > While unpacking trees (e.g. during git checkout), when we hit a > > cache > > entry that's past and outside our path, we cut off iteration. > > > > This provides about a 45% speedup on git checkout between master > > and > > master^20000 on Twitter's monorepo. Speedup in general will depend > > on > > repostitory structure, number of changes, and packfile packing > > decisions. > > I feel like I'm missing the explanation of the quadratic part. From > looking at the patch, my guess is: > > 1. We're doing a linear walk in a data structure (a "struct > index_state"). > > 2. For each element, we look it up in another structure > ("struct traverse_info") with a linear search. > > That leaves us at O(m*n), but if we assume both are on the same > order of magnitude, that's quadratic. No, I think, it's the opposite order: we're doing a linear walk over the incoming tree and for each entry, we're calling find_cache_pos. find_cache_pos was doing a linear walk over struct index_state. But the same algorithmic complexity holds. > 3. The fix works by knowing that once a lookup in (2) fails once, > it's > likely to fail for all the remainder, and we short-cut that case > and skip out of (1) completely. > > But that makes me wonder. Aren't we still quadratic in the case that > ce_in_traverse_path() returns true? I think that doesn't happen very often, because it requires that the paths match up. > If so, would we benefit from either: > > a. Improving the complexity of ce_in_traverse_path, to say O(log > n), > which would give us O(n log n) for the whole operation in all > cases? > > b. If both lists are already sorted, maybe doing a list-merge to > compare them in O(2n) time? (b) appears to be now (roughly) what we're now doing. > I'm fairly ignorant of this part of the code, so there's probably a > good > reason why my suggestion is unworkable. I am also quite ignorant of this part of the code; I just looked at perf and did some simple counting. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html