From: David Ahern <david.ahern@xxxxxxxxxx> Date: Wed, 22 Apr 2015 18:29:12 -0600 > On 4/22/15 5:25 PM, David Miller wrote: >> From: David Ahern <david.ahern@xxxxxxxxxx> >> Date: Wed, 22 Apr 2015 17:19:23 -0600 >> >>> Only thing left in my queue is optimized versions of the ffs / fls >>> families, but that patch is v9b specific, not M7. >> >> Something faster than the popc thing in arch/sparc/lib/ffs.S? > > hmmm... i saw that, but wasn't clear 1) how it got inserted and 2) the > overhead of a function call versus inline. Anyways, what I have is the > same 3 instructions as an inline. But really the __ffs was just along > for the ride; the focus was on __fls. Because we must support all processors in a single kernel image, the called assembler routine that gets patched is the best tradeoff in my opinion. I strongly recommend we do the same thing for any optimizations done to fls*(). >> Are you thinking of using "lzcnt"? I wasn't impressed with the >> performance of that instruction last time I played around with it. > > A comparison of what I hacked together is attached (columns too wide > for inline). Data is from a T4-2. It shows lzcnt to be better for > __fls, fls and fl64. Cool, is it faster when used in your tests for ffs() too? -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html