Re: [RFC] Cache negative delta pairs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 29, 2006 at 11:42:14AM -0400, Nicolas Pitre wrote:

> So the negative cache should not be O(N^2) either.  It just has to be 
> O(N*window).

Yes, unless we use the cache information to change the window (but I
don't think that's worth it, unless our window heuristics are bad).
The implementation I posted should already grow to O(N*window), since it
only saves what we try (and fail at).

> My past experiments showed that the best window size for compression is 
> a bit larger than the current default of 10.  It was rather around 15 
> for the kernel repository, with higher values than 15 not providing 
> significant improvements anymore.  But that is clearly a function of the 
> repository nature (the average number of revisions for each file).  But 
> the window size is directly connected to the computational cost.

Increasing the window, of course, has virtually no speed impact on later
repacks with the cache in effect. Packing with a window of 15 drops my
linux-2.6 pack to 108M from 122M. That helps offset the cache size
(though of course it takes longer on the initial pack).

> This is way suboptimal.  First there is no reason for the cache to ever 
> grow to N^2.  At worst it should be N*10 where 10 is the current window 
> size.

I assumed the window would change over time (though our total is still
likely to hang around N*10 rather than N^2).

> none of the objects found in a given window.  Therefore we only have to 
> compute a hash for the object names found in that window and store that 
> in the cache.  So the cache entries would then be a pair of sha1: first 
> the sha1 of the victim object, and the sha1 of all sha1 names for the 
> objects against which the victim object was found not to delta well 
> against.

This will fail to hit the cache anytime the window changes. How often
does the window change? In my test case, I would think anytime I added a
bunch of new photos, it would be likely that one of them would make it
into the window, thus invalidating the cache entry and forcing me to try
against every object in the window (even though I've already tried
9/10).

Also, is it true to say "if this object did not delta against this
window, it will never delta?" What about interactions with the depth
parameter?

> So given my GIT repository such a cache would be 7610 * 40 = 304400 
> bytes if we stick to the full 40 bytes of sha1 to hash bad combinations.

Keep in mind that it will grow every time the window changes.

-Peff
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]