On Sun, Sep 12, 2021 at 10:51:21AM +1000, Finn Thain wrote:
However, I did notice that preempt_count_add() and preempt_count_sub() and the associated macros in linux/preempt.h produce very inefficient code in the interrupt fast path. Here's one example (there are others). 000323de <irq_enter_rcu>: 323de: 4e56 0000 linkw %fp,#0 323e2: 200f movel %sp,%d0 323e4: 0280 ffff e000 andil #-8192,%d0 323ea: 2040 moveal %d0,%a0 323ec: 2028 000c movel %a0@(12),%d0 323f0: 0680 0001 0000 addil #65536,%d0 323f6: 2140 000c movel %d0,%a0@(12) 323fa: 082a 0001 000f btst #1,%a2@(15) 32400: 670c beqs 3240e <irq_enter_rcu+0x30> 32402: 2028 000c movel %a0@(12),%d0 32406: 2028 000c movel %a0@(12),%d0 3240a: 2028 000c movel %a0@(12),%d0 3240e: 4e5e unlk %fp 32410: 4e75 rts
If I'm reading the code correctly, this is due to the use of READ_ONCE in a bunch of places. I'm pretty sure that forces the compiler to produce a separate read instruction each time even if the value isn't used or the operation could have otherwise been optimized. Reading the result it's obvious this isn't useful (particularly the three discarded reads of the same address), but I think the compiler is just doing what we told it to do. Each time we load %a0@(12), that's one of these instances (I think it's the preempt count in this case). Obviously every one of them would be safe to optimize given the context, but the compiler doesn't know that. I suspect this could be optimized by defining alternate versions of some of the relevant macros that do nothing for cases like this. However, it might be tricky to figure out ways to do this that won't break something else. Brad Boyer flar@xxxxxxxxxxxxx