Re: [PATCH] sparc: perf: Add support M7 processor

David Miller <davem@xxxxxxxxxxxxx> · Wed, 22 Apr 2015 21:39:23 -0400 (EDT)

From: David Ahern <david.ahern@xxxxxxxxxx>
Date: Wed, 22 Apr 2015 18:29:12 -0600

> On 4/22/15 5:25 PM, David Miller wrote:
>> From: David Ahern <david.ahern@xxxxxxxxxx>
>> Date: Wed, 22 Apr 2015 17:19:23 -0600
>>
>>> Only thing left in my queue is optimized versions of the ffs / fls
>>> families, but that patch is v9b specific, not M7.
>>
>> Something faster than the popc thing in arch/sparc/lib/ffs.S?
> 
> hmmm... i saw that, but wasn't clear 1) how it got inserted and 2) the
> overhead of a function call versus inline. Anyways, what I have is the
> same 3 instructions as an inline. But really the __ffs was just along
> for the ride; the focus was on __fls.

Because we must support all processors in a single kernel image, the
called assembler routine that gets patched is the best tradeoff in my
opinion.

I strongly recommend we do the same thing for any optimizations done
to fls*().

>> Are you thinking of using "lzcnt"?  I wasn't impressed with the
>> performance of that instruction last time I played around with it.
> 
> A comparison of what I hacked together is attached (columns too wide
> for inline). Data is from a T4-2. It shows lzcnt to be better for
> __fls, fls and fl64.

Cool, is it faster when used in your tests for ffs() too?
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html