On 7/5/2010 10:35 PM, Maciej W. Rozycki wrote: > Well, <asm/mach-malta/cpu-feature-overrides.h> seems to be getting this > right. MIPS IV options are not included as quite rare compared to > MIPS32/64 ones and run-time determined defaults apply. For MIPS32/64 > configurations compile-time optimisations work as usually. Good catch, I missed that point. Then CONFIG_CPU_32/64 used at irq_ffs() can be replaced with cpu_has_clo_clz without problems. (eell-designed, anyway) >>> It might be simpler just to use __builtin_ffs() for this variant though. >>> Inline assembly is better avoided unless absolutely required. Not even >>> mentioning readability. >> >> Hm. It might be simpler, but it's not the purpose of irq_ffs(), IMHO. > > My point is whenever cpu_has_clo_clz is hardcoded to 1 GCC will expand > __builtin_ffs() to CLZ as expected and may potentially be able to optimise > it further. For the fallback path I agree you do not want to use > __builtin_ffs() as you want to save processing time of the unneeded bits Got it. I was too lazy, thanks for kind clarification. As for __builtin_ffs(), in addition to Ralf's comments on GCC versions, I was thinkg about two things: 1) Can we really get irq_ffs() optimized using __builtin_ffs/fls()? __builtin_ffs/fls() will emit CLZ if available, that's fine. But we don't want ffs/fls itself here. First, contrary to its name, current irq_ffs() implementation is very similar to __fls(). It's like a super-subset of __fls() only menat for CP0.Status check. And ffs is basically achieved by fls(word & -word). So __builtin_ffs/fls() could never be smaller than current irq_ffs(). irq_ffs() < __fls() < __fls(w & -w) == __ffs() # <asm/bitops.h> Consequently we can't obtain smaller irq_ffs() using __builtin_ffs/fls(). IIUC, current form of irq_ffs() will be useful in the future. 2) How cpu_has_clo_clz should be handled in the kernel This is not limited to cpu_has_clo_clz, but more general issue. Using __builtin_foo features allow us to get a variety of GCC services, optimizations. But also means that, it increases dependency on the GCC, strictly speaking, dependency on processor specifiers (-march=). For instance, when cpu_has_clo_clz is specified, we'd like to make it to the assembler CLZ, even if it's not supported. Or we should not do that? This is already pointed out from Maciej in the previous comment: > ".set mips32" looks dodgy here. For pre-MIPS32/64 platforms this > code should never make it to the assembler and if it did, then a > build-time error is better than a run-time problem. I know there's not always one answer, everything should be handled on a case by case basis. I'm just wondering about such things for a while (no need to reply) > -- there's no 8-bit FFS intrinsic let alone a 4-bit one (I'm assuming > there is a reason this piece of code does not check lower 4 interrupt > inputs). Failed to understand what you say (sorry!), but as far as I examined irq_ffs(), it returns from 7 to 0. Perhaps the following "bits 12..15" could be replaced with "bits 8..15"? Or am I missing something? > /* > * Version of ffs that only looks at bits 12..15. > */ Once things become clear (for me), I'll make a patchset incorporating cpu-feature-overrieds.h against PowerTV (with David's SOB of course). -- Shinya Kuribayashi Renesas Electronics