On Thu, 29 Jun 2006, Jeff King wrote: > On Thu, Jun 29, 2006 at 02:24:57PM -0400, Nicolas Pitre wrote: > > > > I assumed the window would change over time (though our total is still > > > likely to hang around N*10 rather than N^2). > > It doesn't change unless you force a different window size. > > Sorry, I meant "the items in the window for a given object would change > over time." > > > > This will fail to hit the cache anytime the window changes. How often > > > does the window change? In my test case, I would think anytime I added a > > > bunch of new photos, it would be likely that one of them would make it > > > into the window, thus invalidating the cache entry and forcing me to try > > > against every object in the window (even though I've already tried > > > 9/10). > > Sure. But on the lot how often will that happen? > > Reasonably often, according to my test. I did this to simulate usage > over time: > - create an empty repo > - from my test repo of 515 images, grab 20 at a time and add/commit > them > - after each commit, record the SHA1 of (object, window[0..n]) for > each object to be delta'd > If doing the cache on the sha1 of the whole window is a good idea, then > we should see many of the same hashes from commit to commit. If we > don't, that means the newly added files are being placed in the old > windows, thus disrupting their hashes. > > The results were that there was typically only 1 reusable window each > time I added 20 files. At that point, caching is largely pointless. Right. Your use pattern is a special case that doesn't work well with the whole window hash approach. I'd expect it to work beautifully with the kernel repository though. > > And even then, since my suggested method implies only one cache lookup > > in a much smaller cache instead of 10 lookups in a larger cache for each > > objects it might end up faster overall even if sometimes some windows > > don't match and deltas are recomputed needlessly. > > I didn't benchmark, but I doubt it will have significant impact. > Especially on my photo test repo, the lookups are dominated by the > create_delta time by several orders of magnitude. Again I think it is a repo like the linux kernel that would benefit more. > > Of course a greater depth might allow for a hit where there isn't any > > otherwise. But changing the delta depth is not something someone does > > that often, and when the depth is changed then you better use -f with > > git-repack as well which like I said should also ignore the cache. > > That sounds reasonable to me for depth. What about other reasons for > try_delta to fail? Preferred base? Hmmm. That might need to be dealth with (easily but still). Nicolas - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html