On Sat, 17 Mar 2007, Junio C Hamano wrote: > > When unpacking a depth-3 deltified object A, the code finds the > target object A (which is a delta), ask for its base B and put B > in the cache after using it to reconstitute A. While doing so, > the first-generation base B is also a delta so its base C (which > is a non-delta) is found and placed in the cache. When A is > returned, the cache has B and C. If you ask for B at this > point, we read the delta, pick up its base C from the cache, > apply, and return while putting C back in the cache. If you ask > for A after that, we do not read from the cache, although it is > available. Yes. I debated that a bit with myself, but decided that: (a) it probably doesn't really matter a lot (but I don't have the numbers) (b) trying to *also* fill non-delta-base queries from the delta-base cache actually complicates things a lot. Surprisingly much so (the current logic of removing the entry from the cache only to re-insert it after being used made the memory management totally trivial, as you noticed) (c) and regardless, we could decide to do a more extensive caching layer later if we really wanted to, and at that point it probably makes more sense to integrate it with the delta-base cache. Most git objects are use-once, which is why we really *just* save the flag bits and the SHA1 hash name itself in "struct object", but doing a generic caching layer for object content would likely obviate the need for the current logic to do "save_commit_buffer". That (c) in particular was what made me think that it's better to keep it simple and obvious for now, since even the simple thing largely fixes the performance issue. Almost three seconds I felt bad about, while just over a second for something as complex as "git log drivers/usb/" I just cannot make myself worry about. > In any way, your code makes a deeply delitified packfiles a lot > more practical. As long as the working set of delta chains fits > in the cache, after unpacking the longuest delta, the objects on > the chain can be had by one lookup and one delta application. Yeah. I think it would be good to probably (separately and as "further tweaks"): - have somebody actually look at hit-rates for different repositories and hash sizes. - possibly allow people to set the hash size as a config option, if it turns out that certain repository layouts or usage scenarios end up preferring bigger caches. For example, it may be that for historical archives you might want to have deeper delta queues to make the repository smaller, and if they are big anyway maybe they would prefer to have a larger-than-normal cache as a result. On the other hand, if you are memory-constrained, maybe you'd prefer to re-generate the objects and waste a bit of CPU rather than cache the results. But neither of the above is really an argument against the patch, just a "there's certainly room for more work here if anybody cares". > Very good job. I'm pretty happy with the results myself. Partly because the patches just ended up looking so *nice*. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html