Re: [PATCH v2] unpack-trees: fix accidentally quadratic behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 21, 2016 at 04:11:48PM -0500, David Turner wrote:

> While unpacking trees (e.g. during git checkout), when we hit a cache
> entry that's past and outside our path, we cut off iteration.
> 
> This provides about a 45% speedup on git checkout between master and
> master^20000 on Twitter's monorepo.  Speedup in general will depend on
> repostitory structure, number of changes, and packfile packing
> decisions.

I feel like I'm missing the explanation of the quadratic part. From
looking at the patch, my guess is:

  1. We're doing a linear walk in a data structure (a "struct
     index_state").

  2. For each element, we look it up in another structure
     ("struct traverse_info") with a linear search.

     That leaves us at O(m*n), but if we assume both are on the same
     order of magnitude, that's quadratic.

  3. The fix works by knowing that once a lookup in (2) fails once, it's
     likely to fail for all the remainder, and we short-cut that case
     and skip out of (1) completely.

But that makes me wonder. Aren't we still quadratic in the case that
ce_in_traverse_path() returns true? If so, would we benefit from either:

  a. Improving the complexity of ce_in_traverse_path, to say O(log n),
     which would give us O(n log n) for the whole operation in all
     cases?

  b. If both lists are already sorted, maybe doing a list-merge to
     compare them in O(2n) time?

I'm fairly ignorant of this part of the code, so there's probably a good
reason why my suggestion is unworkable.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]