On Thu, Dec 09, 2010 at 06:09:38PM +1100, Nick Piggin wrote: > So replace it with an open-coded byte comparison. This increases code > size by 24 bytes in the critical __d_lookup_rcu function, but the Actually, if the loop assumes len is non zero (which is the case for dentry compare), then the bloat is only 8 bytes, so not a problem. Also got numbers versus vanilla kernel, out of interest. > speedup is huge, averaging 10 runs of each: > > git diff st user sys elapsed CPU vanilla 1.19 3.21 4.47 98.0 > before 1.15 2.57 3.82 97.1 > after 1.14 2.35 3.61 96.8 > > git diff mt user sys elapsed CPU vanilla 1.57 45.75 3.60 1312 > before 1.27 3.85 1.46 349 > after 1.26 3.54 1.43 333 > Single thread elapsed time improvment vanilla vs vfs 19.23%. Not quite as big as the AMD fam10h speedup, that's probably because Westmere does atomics so damn quickly. Multi thread numbers are no surprise. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html