On Sun, Apr 3, 2011 at 12:28 AM, Dan McGee <dpmcgee@xxxxxxxxx> wrote: > On Sat, Apr 2, 2011 at 4:28 AM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: >> On Thu, Mar 31, 2011 at 8:38 AM, Dan McGee <dpmcgee@xxxxxxxxx> wrote: >>> We know our mode entry in our tree objects should be 5 or 6 characters >>> long. This change both enforces this fact and also unrolls the parsing >>> of the information giving the compiler more room for optimization of the >>> operations. >> >> I'm skeptical. Did you measure signficant gain after this patch? I >> looked at asm output with -O3 and failed to see the compiler doing >> anything fancy. Perhaps it's because I'm on x86 with quite small >> register set. > > I'm on x86_64 and was just using -O2; -O3 produces the same output > actually. You can see it below. I had taken a look at this before I > submitted, and noticed a few things: > 1. We do use multiple registers now since we aren't constrained to a loop. > 2. movzbl (for the string parts) and cmb instructions tend to get > clustered first. > 3. mozbl (for the mode shifting) and leal instructions tend to get > clustered later. > 4. The normal case now involves no conditional jumps until the ' ' > (space) comparison. > > Call these "trivial", but on my worst case operation times went from > (shown below) 27.41 secs to 26.49 secs. Considering this operation is > called 530,588,868 times (that is not a typo) during this operation, > every saved instruction or non-missed branch prediction does seem to > make a difference. If it makes it better for you, I'm good. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html