On Wed, Feb 03, 2010 at 07:42:51AM -0800, Andrew Morton wrote: > We didn't deal with it on every architecture, which is something which > the compiler extension takes care of. > > In fact I can't find anywhere where we dealt with it on x86. Yeah, we talked briefly about using hardware popcnt, see thread beginning at http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-06/msg00245.html for example. I did an ftrace of the cpumask_weight() calls in sched.c to see whether there would be a measurable performance gain but it didn't seem so at the time. My numbers said something like ca. 170 hweight calls per second and since the <lib/hweight.c> implementations roughly translate to something like ~20 isns (hweight64 to about ~30), the whole thing wasn't worth the trouble considering checking binutils versions and slapping opcodes or using gcc intrinsics which involves gcc version checking. An alternatives solution which is based on CPUID flag could add the popcnt opcode without checking any toolchain versions but how is the replaced instruction going to look like? Something like alternative("call hweightXX", "popcnt", X86_FEATURE_POPCNT) by making sure the arg is in some register first? Hmm.. -- Regards/Gruss, Boris. -- Advanced Micro Devices, Inc. Operating Systems Research Center -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html