Re: [RFC] Cache negative delta pairs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 29 Jun 2006, Jeff King wrote:

> On Thu, Jun 29, 2006 at 02:24:57PM -0400, Nicolas Pitre wrote:
> 
> > > I assumed the window would change over time (though our total is still
> > > likely to hang around N*10 rather than N^2).
> > It doesn't change unless you force a different window size.
> 
> Sorry, I meant "the items in the window for a given object would change
> over time." 
> 
> > > This will fail to hit the cache anytime the window changes. How often
> > > does the window change? In my test case, I would think anytime I added a
> > > bunch of new photos, it would be likely that one of them would make it
> > > into the window, thus invalidating the cache entry and forcing me to try
> > > against every object in the window (even though I've already tried
> > > 9/10).
> > Sure.  But on the lot how often will that happen?
> 
> Reasonably often, according to my test. I did this to simulate usage
> over time:
>   - create an empty repo
>   - from my test repo of 515 images, grab 20 at a time and add/commit
>     them
>   - after each commit, record the SHA1 of (object, window[0..n]) for
>     each object to be delta'd
> If doing the cache on the sha1 of the whole window is a good idea, then
> we should see many of the same hashes from commit to commit. If we
> don't, that means the newly added files are being placed in the old
> windows, thus disrupting their hashes.
> 
> The results were that there was typically only 1 reusable window each
> time I added 20 files. At that point, caching is largely pointless.

Right.  Your use pattern is a special case that doesn't work well with 
the whole window hash approach.  I'd expect it to work beautifully with 
the kernel repository though.

> > And even then, since my suggested method implies only one cache lookup 
> > in a much smaller cache instead of 10 lookups in a larger cache for each 
> > objects it might end up faster overall even if sometimes some windows 
> > don't match and deltas are recomputed needlessly.
> 
> I didn't benchmark, but I doubt it will have significant impact.
> Especially on my photo test repo, the lookups are dominated by the
> create_delta time by several orders of magnitude.

Again I think it is a repo like the linux kernel that would benefit 
more.

> > Of course a greater depth might allow for a hit where there isn't any 
> > otherwise.  But changing the delta depth is not something someone does 
> > that often, and when the depth is changed then you better use -f with 
> > git-repack as well which like I said should also ignore the cache.
> 
> That sounds reasonable to me for depth. What about other reasons for
> try_delta to fail? Preferred base?

Hmmm.  That might need to be dealth with (easily but still).


Nicolas
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]