On Thu, Dec 09, 2010 at 06:09:38PM +1100, Nick Piggin wrote: > I was actually discussing this with Linus a while back, and finally > got around to testing it out now that I have a modern CPU to measure > it on! CCing linux-arch because it would be interesting to know > whether your tuned functions do better than gcc or not (I would > suspect not). > > BTW. patch and numbers are on top of my scaling series, just for > an idea of what it does, I just want to generate some interesting > discussion. > > If people are interested in running benchmarks, I'll be pushing out > a new update soon, after some more testing and debugging here. > > The standard memcmp function on a Westmere system shows up hot in > profiles in the `git diff` workload (both parallel and single threaded), > and it is likely due to the costs associated with trapping into > microcode, and little opportunity to improve memory access (dentry > name is not likely to take up more than a cacheline). > > So replace it with an open-coded byte comparison. This increases code > size by 24 bytes in the critical __d_lookup_rcu function, but the > speedup is huge, averaging 10 runs of each: > > git diff st user sys elapsed CPU > before 1.15 2.57 3.82 97.1 > after 1.14 2.35 3.61 96.8 > > git diff mt user sys elapsed CPU > before 1.27 3.85 1.46 349 > after 1.26 3.54 1.43 333 > > Elapsed time for single threaded git diff at 95.0% confidence: > -0.21 +/- 0.01 > -5.45% +/- 0.24% Nice. [..] > +static inline int dentry_memcmp(const unsigned char *cs, > + const unsigned char *ct, size_t count) > +{ > + while (count) { > + int ret = (*cs != *ct); > + if (ret) > + return ret; > + cs++; > + ct++; > + count--; > + } > + return 0; > +} we have a memcmp() in lib/string.c. Maybe reuse it from there? -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html