On Thu, Jun 29, 2006 at 11:42:14AM -0400, Nicolas Pitre wrote: > So the negative cache should not be O(N^2) either. It just has to be > O(N*window). Yes, unless we use the cache information to change the window (but I don't think that's worth it, unless our window heuristics are bad). The implementation I posted should already grow to O(N*window), since it only saves what we try (and fail at). > My past experiments showed that the best window size for compression is > a bit larger than the current default of 10. It was rather around 15 > for the kernel repository, with higher values than 15 not providing > significant improvements anymore. But that is clearly a function of the > repository nature (the average number of revisions for each file). But > the window size is directly connected to the computational cost. Increasing the window, of course, has virtually no speed impact on later repacks with the cache in effect. Packing with a window of 15 drops my linux-2.6 pack to 108M from 122M. That helps offset the cache size (though of course it takes longer on the initial pack). > This is way suboptimal. First there is no reason for the cache to ever > grow to N^2. At worst it should be N*10 where 10 is the current window > size. I assumed the window would change over time (though our total is still likely to hang around N*10 rather than N^2). > none of the objects found in a given window. Therefore we only have to > compute a hash for the object names found in that window and store that > in the cache. So the cache entries would then be a pair of sha1: first > the sha1 of the victim object, and the sha1 of all sha1 names for the > objects against which the victim object was found not to delta well > against. This will fail to hit the cache anytime the window changes. How often does the window change? In my test case, I would think anytime I added a bunch of new photos, it would be likely that one of them would make it into the window, thus invalidating the cache entry and forcing me to try against every object in the window (even though I've already tried 9/10). Also, is it true to say "if this object did not delta against this window, it will never delta?" What about interactions with the depth parameter? > So given my GIT repository such a cache would be 7610 * 40 = 304400 > bytes if we stick to the full 40 bytes of sha1 to hash bad combinations. Keep in mind that it will grow every time the window changes. -Peff - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html