Re: cleaner/better zlib sources?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 15 Mar 2007, Jeff Garzik wrote:
> 
> ISTR there are a bunch of state transitions per byte, which would make sense
> that it shows up on profiles.

Yeah. I only looked at the kernel sources (which include a cleaned-up 
older version of zlib), but they seemed to match the disassembly that I 
saw when doing the instruction-level profiling.

The code *sometimes* falls through from one state to another, ie there's a 
case where zlib does:

	...
        case DICTID:
            NEEDBITS(32);
            strm->adler = state->check = REVERSE(hold);
            INITBITS();
            state->mode = DICT;
        case DICT:
	...

which will obviously generate fine code. There's a few other examples of 
that, *BUT* most of the stuff is just horrid, like

        case HEAD:
            if (state->wrap == 0) {
                state->mode = TYPEDO;
                break;
            }

where the "break" (which simply breaks out of the case-statement, and thus 
just loops back to it:

    for (;;)
        switch (state->mode) {

thing is nasty nasty nasty).

A trivial thing to do is to just replace such

	state->mode = xxx;
	break;

with

	state->mode = xxx;
	goto xxx_mode;

and all of that complicated run-time code *just*goes*away* and is replaced 
by a relative no-op (ie an unconditional direct jump).

Some of them are slightly more complex, like

	state->mode = hold & 0x200 ? DICTID : TYPE;
	INITBITS();
	break;

which would need to be rewritten as

	old_hold = hold;
	INITBITS();
	if (old_hold & 0x200) {
		state->mode = DICTID;
		goto dictid_mode;
	}
	state->mode = TYPE;
	goto dictid_mode;

but while that looks more complicated on a source code level it's a *lot* 
easier for a CPU to actually execute.

Same obvious performance problems go for

	case STORED:
	   ...
        case COPY:
            copy = state->length;
            if (copy) {
		...
		state->length -= copy;
		break;
	    }
	    Tracev((stderr, "inflate:       stored end\n"));
	    state->mode = TYPE;
	    break;

notice how when you get to the COPY state it will actually have nicely 
fallen through from STORED (one of the places where it does that), but 
then it will go through the expensive indirect jump not just once, but at 
least *TWICE* just to get to the TYPE thing, because you'll first have to 
re-enter COPY with a count of zero to get to the place where it sets TYPE, 
and then does the indirect jump immediately again!

In other words, the code is *incredibly* badly written from a performance 
angle.

Yeah, a perfect compiler could do this all for us even with unchanged zlib 
source code, but (a) gcc isn't perfect and (b) I don't really think 
anything else is either, although if things like this happen as part of 
gzip in SpecInt, I wouldn't be surprised by compilers doing things like 
this just to look good ;)

Especially with something like git, where we know that most of the time we 
have the full input buffer (so inflate() generally won't be called in the 
"middle" of a inflate event), we probably have a very well-defined start 
state too, so we'd actually predict perfectlyon the *first* indirect jump 
if we just never did any more of them.

> I could have sworn that either Matt Mackall or Ben LaHaise had cleaned up the
> existing zlib so much that it was practically a new implementation.  I'm not
> aware of any open source implementations independent of zlib (except maybe
> that C++ behemoth, 7zip).

I looked at the latest release (1.2.3), so either Matt/Ben hasn't been 
merged, or this wasn't part of the thing they looked at.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]