On Fri, 16 Mar 2007, Linus Torvalds wrote: > The way we unpack delta chains is that we do > > - find a delta > - we apply it to "recursively unpack the thing it's a delta to" > > which sounds totally obvious and straightforward, right? > > EXCEPT it's actually O(n**2) in the delta depth, because we never save the > intermediate results, so when we have a delta depth of 10 (our default), > and we decode a lot of these things, we basically will look up the base > object 10 times, apply the first delta 9 times, apply the second delta 8 > times, etc etc.. In the worst case, yes. And if you're walking history then the probability of hitting the worst case eventually is rather high. > I also didn't worry about it, because I felt that if it became a problem, > it would be easy to just add a cache of base objects (we probably do *not* > want to keep the whole unpacked object info in memory all the time just > because of memory pressure issues, so "cache of base objects" is better). > However, the "pack file + offset" thing makes it harder to do, since we > now don't even have the SHA1 of the base object before we unpack it. > > But I guess we could just index this by a <packfile, offset> tuple. Right. Should be really trivial to hook into unpack_delta_entry() actually replacing the call to unpack_entry() with a wrapper function that returns cached data, or populates the cache with unpack_entry() when no match is found. Then it would only be a matter of coming up with a clever cache eviction algorithm. > Anyway, I bet that this is a much bigger issue than the pack format > itself (and is largely independent). Well, I think the pack format issue is significant too. But because those are independent issues the gain in performance will be additive. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html