Re: cleaner/better zlib sources?

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 16 Mar 2007 09:11:06 -0700 (PDT)

On Thu, 16 Mar 2007, linux@xxxxxxxxxxx wrote:
>
> Er, it's a little hard to see, but zlib spends the bulk of its time
> in inflate_fast().

Not for git.

It may be true for *big* inflate events, but the bulk of git inflates are 
small trees etc. Here is one particular profile:

	CPU: Core 2, speed 1596 MHz (estimated)
	Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit 
	mask of 0x00 (Unhalted core cycles) count 100000
	samples  %        image name               app name                 symbol name
	169540   15.9845  git                      git                      inflate
	138377   13.0464  git                      git                      inflate_fast
	94738     8.9321  git                      git                      inflate_table
	73511     6.9308  git                      git                      strlen
	70398     6.6373  git                      git                      find_pack_entry_one
	59200     5.5815  vmlinux                  vmlinux                  __copy_user_nocache
	45103     4.2524  git                      git                      adler32
	42973     4.0516  git                      git                      memcpy
	23438     2.2098  git                      git                      interesting
	..

so yes, inflate_fast() is certainly up there too, but plain "inflate()" is 
actually more important.

(Btw, to get this level of detail, you need to link statically, at least 
if you are using Fedora Core. If you do the normal dynamic linking, 
oprofile will not be able to show you which function, and will just say 
that 50% of all time was spent in libz.so.1.2.3).

Also note that the above is with oprofile: do *not* bother to try to 
profile using "-pg" and gprof. That changes the binary so much as to be 
useless - small functions get the profiling code overhead and things that 
take no time at all will suddenly become twice or three times as 
expensive. Using oprofile you get fairly correct results.

> The code in inflate.c just handles the last few bytes when near
> one limit or the other.  Are you sure it's a performance problem?

See above. I'm absolutely positive.

(The load for the above was roughly:

	git log drivers/usb/ > /dev/null

on the kernel tree - ie I was mainly testing the commit tree pruning, 
which is one of the most fundamnetal operations, and is what makes 
or breaks the performance of things like "git blame" etc).

I'd obviously *also* like to see inflate_fast() go down, and it has some 
really strange code too, like:

	#  define PUP(a) *++(a)

			...
                            len -= op;
                            do {
                                PUP(out) = PUP(from);
                            } while (--op);
			...

which looks rather odd, wouldn't you say? That's a memcpy. But I 
especially find the nice "unrolled" memcpy interestign:

		...
                    while (len > 2) {
                        PUP(out) = PUP(from);
                        PUP(out) = PUP(from);
                        PUP(out) = PUP(from);
                        len -= 3;
                    }
                    if (len) {
                        PUP(out) = PUP(from);
                        if (len > 1)
                            PUP(out) = PUP(from);
                    }
		...

yeah, that's right - we unroll memcpy() BY DOING IT THREE BYTES AT A TIME!

Whoever wrote that crap must have been on some strange medication. If you 
need to do enough of a memcpy() that you want to unroll it, you sure as 
hell don't want to do it a byte-at-a-time, you want to do it with full 
words etc. And the thing is called "memcpy()", for chrissake!

That's the "optimized" code. Strange.

> There's equivalent inflate code in the PGP 5.0i distribution

Interesting. I'll take a look.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html