First this is Shawn's patch not mine. But I still don't think it is that bad even as is, and it was already merged by Junio. I believe it can be made better without segregating big objects though. On Mon, 19 Mar 2007, Linus Torvalds wrote: > And no, I don't have any "hard numbers" to back this up, but if you want > another argument, realize that delta chains are limited to a depth of ten, > and if a *single* chain can overflow the delta-chain cache, then the cache > ends up being 100% *in*effective. No. You forget what I said about delta direction. Deltas are meant to be ordered so a newer object should be a base for an older objects. And we usually parse objects from newer to older ones too. So let's suppose that the cache has only two entries. If there was only one delta chain in the pack and it is ideally laid out so the delta chain is effectively going deeper with older objects as we currently mean it to, then the cache would still be perfectly effective because, as we move further back in history, objects deeper in the delta chain will be requested, and the delta base a single level up will be there to be used just before being evicted. And so on all through the delta chain. > And do the math: if the object size is > 10-20% of the total allowed cache size, how many objects in the chain are > you going to fit in the cache? Only 5 objects in the 20% case. But as I say you could effectively have only _one_ entry in the cache and it would still be 100% effective in the single perfect delta chain I described above. Of course in practice there is more than a single delta chain, and the delta direction might be the other way around. But on _average_ it should work fine pretty well. And notice how blobs are the first victims for cache eviction. So for a mixed tree+blob usage case, if big blobs are to trash the cache, then with LRU eviction performance should auto-regulate itself quite fine. Of course a single blob might clear the cache entirely in the worst case. But there is a possibility that this big blob could be the base for the next delta. Adding a size treshold simply throw that possibility away in which case your performance will suffer _anyway_ because you'll experience the O(n**2) behavior with that delta chain of big blobs. Compared to that the lost cache of small objects won't even be on the radar. But if you let large objects reach the cache then there is at least some possibility that those large objects might be the next wanted delta base effectively saving big time on performance especially if that base was itself a deep delta object. So to resume: - the issue concerns only operations needing blob parsing - if large deltified blobs are denied the cache you will _always_ have a sudden and drastic performance degradation when the size treshold is crossed as the O(n**2) behavior on those big blob deltas will dominate the profile - if you instead don't impose any treshold then performance will degrade smoothly with object size down to the absolute worst case of a single object in the cache, and for some cases it might even be quite effective! I think it is therefore best to let the cache regulate itself with LRU eviction than using arbitrary size treshold to segregate objects. The cache will quickly repopulate itself if the big blob happens to be an odd ball in a repository anyway. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html