Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required

Shinya Kuribayashi <shinya.kuribayashi.px@xxxxxxxxxxx> · Tue, 06 Jul 2010 20:17:33 +0900

On 7/5/2010 10:35 PM, Maciej W. Rozycki wrote:
>  Well, <asm/mach-malta/cpu-feature-overrides.h> seems to be getting this 
> right.  MIPS IV options are not included as quite rare compared to 
> MIPS32/64 ones and run-time determined defaults apply.  For MIPS32/64 
> configurations compile-time optimisations work as usually.

Good catch, I missed that point.  Then CONFIG_CPU_32/64 used at
irq_ffs() can be replaced with cpu_has_clo_clz without problems.
(eell-designed, anyway)

>>>  It might be simpler just to use __builtin_ffs() for this variant though.  
>>> Inline assembly is better avoided unless absolutely required.  Not even 
>>> mentioning readability.
>>
>> Hm.  It might be simpler, but it's not the purpose of irq_ffs(), IMHO.
> 
>  My point is whenever cpu_has_clo_clz is hardcoded to 1 GCC will expand 
> __builtin_ffs() to CLZ as expected and may potentially be able to optimise 
> it further.  For the fallback path I agree you do not want to use 
> __builtin_ffs() as you want to save processing time of the unneeded bits 

Got it.  I was too lazy, thanks for kind clarification.

As for __builtin_ffs(), in addition to Ralf's comments on GCC
versions, I was thinkg about two things:

1) Can we really get irq_ffs() optimized using __builtin_ffs/fls()?

  __builtin_ffs/fls() will emit CLZ if available, that's fine.
  But we don't want ffs/fls itself here.

  First, contrary to its name, current irq_ffs() implementation
  is very similar to __fls().  It's like a super-subset of __fls()
  only menat for CP0.Status check.  And ffs is basically achieved
  by fls(word & -word).  So __builtin_ffs/fls() could never be
  smaller than current irq_ffs().

  irq_ffs() < __fls() < __fls(w & -w) == __ffs()    # <asm/bitops.h>

  Consequently we can't obtain smaller irq_ffs() using
  __builtin_ffs/fls().

  IIUC, current form of irq_ffs() will be useful in the future.

2) How cpu_has_clo_clz should be handled in the kernel

  This is not limited to cpu_has_clo_clz, but more general issue.
  Using __builtin_foo features allow us to get a variety of GCC
  services, optimizations.  But also means that, it increases
  dependency on the GCC, strictly speaking, dependency on processor
  specifiers (-march=).

  For instance, when cpu_has_clo_clz is specified, we'd like to make
  it to the assembler CLZ, even if it's not supported.  Or we should
  not do that?

  This is already pointed out from Maciej in the previous comment:

  > ".set mips32" looks dodgy here.  For pre-MIPS32/64 platforms this 
  > code should never make it to the assembler and if it did, then a 
  > build-time error is better than a run-time problem.

  I know there's not always one answer, everything should be handled
  on a case by case basis.  I'm just wondering about such things for
  a while (no need to reply)

> -- there's no 8-bit FFS intrinsic let alone a 4-bit one (I'm assuming 
> there is a reason this piece of code does not check lower 4 interrupt 
> inputs).

Failed to understand what you say (sorry!), but as far as I examined
irq_ffs(), it returns from 7 to 0.  Perhaps the following "bits 12..15"
could be replaced with "bits 8..15"?  Or am I missing something?

> /*
>  * Version of ffs that only looks at bits 12..15.
>  */

Once things become clear (for me), I'll make a patchset incorporating
cpu-feature-overrieds.h against PowerTV (with David's SOB of course).
-- 
Shinya Kuribayashi
Renesas Electronics