Re: cleaner/better zlib sources?

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 16 Mar 2007 18:11:50 -0700 (PDT)

On Fri, 16 Mar 2007, Linus Torvalds wrote:
> 
> And yes, it's not hugely faster, but I seem to see *some* difference: this 
> is the real-time of ten runs of 
> 
> 	time git log drivers/usb/ > /dev/null

Damn. I think I know why it's happening, and I'm an idiot. I think it's 
actually an issue I wondered about a *loong* time ago, and then forgot all 
about. And later or Nico made it almost impossible to fix with his "pack 
offset" changes.

The thing that made me realize was one of the callchains into inflate() 
that I looked at:

   (gdb) where
   #0  inflate (strm=0x7fff10d83810, flush=4) at inflate.c:566
   #1  0x000000000044c165 in unpack_compressed_entry (p=0x6d52e0, w_curs=0x7fff10d838e0, curpos=94941911,
       size=<value optimized out>) at sha1_file.c:1348
   #2  0x000000000044c2b6 in unpack_entry (p=0x6d52e0, obj_offset=94941909, type=0x7fff10d85d8c, sizep=0x7fff10d83928)
       at sha1_file.c:1408
   #3  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94942707, type=0x7fff10d85d8c, sizep=0x7fff10d83988)
       at sha1_file.c:1373
   #4  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94943021, type=0x7fff10d85d8c, sizep=0x7fff10d839e8)
       at sha1_file.c:1373
   #5  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94943382, type=0x7fff10d85d8c, sizep=0x7fff10d83a48)
       at sha1_file.c:1373
   #6  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94943531, type=0x7fff10d85d8c, sizep=0x7fff10d83aa8)
       at sha1_file.c:1373
   #7  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94943622, type=0x7fff10d85d8c, sizep=0x7fff10d83b08)
       at sha1_file.c:1373
   #8  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94945357, type=0x7fff10d85d8c, sizep=0x7fff10d83b68)
       at sha1_file.c:1373
   #9  0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94945447, type=0x7fff10d85d8c, sizep=0x7fff10d83bc8)
       at sha1_file.c:1373
   #10 0x000000000044c32e in unpack_entry (p=0x6d52e0, obj_offset=94945571, type=0x7fff10d85d8c, sizep=0x7fff10d85d80)
       at sha1_file.c:1373
   #11 0x000000000044c3f8 in read_packed_sha1 (sha1=<value optimized out>, type=0x7fff10d85d8c, size=0x7fff10d85d80)
       at sha1_file.c:1567
   #12 0x000000000044c741 in read_sha1_file (sha1=0x7fff10d85d60 "ï¿½Ab\217ï¿½ï¿½236ï¿½ï¿½ï¿½031", type=0x7fff10d85d8c,
       size=0x7fff10d85d80) at sha1_file.c:1636
   ....

and notice the deep recursion in sha1_file.

The way we unpack delta chains is that we do

 - find a delta
 - we apply it to "recursively unpack the thing it's a delta to"

which sounds totally obvious and straightforward, right?

EXCEPT it's actually O(n**2) in the delta depth, because we never save the 
intermediate results, so when we have a delta depth of 10 (our default), 
and we decode a lot of these things, we basically will look up the base 
object 10 times, apply the first delta 9 times, apply the second delta 8 
times, etc etc.. 

I didn't worry about it, because it never actually hit as much of a
performance problem (and when you do a *single* tree operation you'd
never see it anyway: you apply the deltas you need, and nothing else),
but what it means is that we actually call inflate on the chain entries
55 times instead of just doing it 10 times. 

It's also somewhat limited by the delta depth that we enforce anyway (I 
say "somewhat", because we only limit the maximum depth, not the number of 
times an object can be used as a base, and if you use an object as a base 
a thousand times, it will literally be unpacked a thousand times too!

I also didn't worry about it, because I felt that if it became a problem, 
it would be easy to just add a cache of base objects (we probably do *not* 
want to keep the whole unpacked object info in memory all the time just 
because of memory pressure issues, so "cache of base objects" is better). 
However, the "pack file + offset" thing makes it harder to do, since we 
now don't even have the SHA1 of the base object before we unpack it.

But I guess we could just index this by a <packfile, offset> tuple.

Anyway, I bet that this is a much bigger issue than the pack format 
itself (and is largely independent).

		Linus