Re: cleaner/better zlib sources?

Nicolas Pitre <nico@xxxxxxx> · Fri, 16 Mar 2007 23:28:33 -0400 (EDT)

On Fri, 16 Mar 2007, Linus Torvalds wrote:

> The way we unpack delta chains is that we do
> 
>  - find a delta
>  - we apply it to "recursively unpack the thing it's a delta to"
> 
> which sounds totally obvious and straightforward, right?
> 
> EXCEPT it's actually O(n**2) in the delta depth, because we never save the 
> intermediate results, so when we have a delta depth of 10 (our default), 
> and we decode a lot of these things, we basically will look up the base 
> object 10 times, apply the first delta 9 times, apply the second delta 8 
> times, etc etc.. 

In the worst case, yes.  And if you're walking history then the 
probability of hitting the worst case eventually is rather high.

> I also didn't worry about it, because I felt that if it became a problem, 
> it would be easy to just add a cache of base objects (we probably do *not* 
> want to keep the whole unpacked object info in memory all the time just 
> because of memory pressure issues, so "cache of base objects" is better). 
> However, the "pack file + offset" thing makes it harder to do, since we 
> now don't even have the SHA1 of the base object before we unpack it.
> 
> But I guess we could just index this by a <packfile, offset> tuple.

Right.  Should be really trivial to hook into unpack_delta_entry() 
actually replacing the call to unpack_entry() with a wrapper function 
that returns cached data, or populates the cache with unpack_entry() 
when no match is found.

Then it would only be a matter of coming up with a clever cache 
eviction algorithm.

> Anyway, I bet that this is a much bigger issue than the pack format 
> itself (and is largely independent).

Well, I think the pack format issue is significant too.  But because 
those are independent issues the gain in performance will be additive.

Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html