On Fri, 10 Sep 2021, Michael Schmitz wrote:
On 10/09/21 10:51, Finn Thain wrote:
On Wed, 8 Sep 2021, Michael Schmitz wrote:
In a related case, I've managed to swap my 'resume_userspace' format
error for a nice 'illegal instruction' format error apparently
caused by an invalid function pointer in
__handle_irq_event_percpu(), just by disabling all interrupts upon
entering the auto_inthandler and user_inthandler exception handlers.
This bug is quite readily reproduced by running your
kernel_coverage.sh script in a loop (panics on the first stress test
on the second pass):
Stress run 2
Logging to stress-ng-20210908-0838.log
./kernel-coverage.sh: line 272: lcov: command not found
running --fork 1 --fork-vm -t 60 --timestamp --no-rand-seed --times
stress-ng: 08:40:08.70 info: [1914] setting to a 60 second run per
stressor
stress-ng: 08:40:08.82 info: [1914] dispatching hogs: 1 fork
packet_write_wait: Connection to 10.1.1.4 port 22: Broken pipe
Why disabling interrupts during interrupt processing would make
matters worse doesn't make any sense to me...
Are you able to reproduce that with a stock mainline build?
I still need to try that. The kernel panic went away when adding
additional local_irq_save()/local_irq_restore() in do_IRQ(),
I eventually realized that the code using percpu variables inside
add_interrupt_randomness() outside of the spinlock may also need irqs
disabled. So I've now done as you did, that is,
diff --git a/arch/m68k/kernel/irq.c b/arch/m68k/kernel/irq.c
index 9ab4f550342e..b46d8a57f4da 100644
--- a/arch/m68k/kernel/irq.c
+++ b/arch/m68k/kernel/irq.c
@@ -20,10 +20,13 @@
asmlinkage void do_IRQ(int irq, struct pt_regs *regs)
{
struct pt_regs *oldregs = set_irq_regs(regs);
+ unsigned long flags;
+ local_irq_save(flags);
irq_enter();
generic_handle_irq(irq);
irq_exit();
+ local_irq_restore(flags);
set_irq_regs(oldregs);
}
There may be a better way to achieve that. If the final IPL can be found
in regs then it doesn't need to be saved again.
I haven't looked for a possible entropy pool improvement from correct
locking in random.c -- it would not surprise me if there was one.
But none of this explains the panics I saw so I went looking for potential
race conditions in the irq_enter_rcu() and irq_exit_rcu() code. I haven't
found the bug yet.
However, I did notice that preempt_count_add() and preempt_count_sub() and
the associated macros in linux/preempt.h produce very inefficient code in
the interrupt fast path. Here's one example (there are others).
000323de <irq_enter_rcu>:
323de: 4e56 0000 linkw %fp,#0
323e2: 200f movel %sp,%d0
323e4: 0280 ffff e000 andil #-8192,%d0
323ea: 2040 moveal %d0,%a0
323ec: 2028 000c movel %a0@(12),%d0
323f0: 0680 0001 0000 addil #65536,%d0
323f6: 2140 000c movel %d0,%a0@(12)
323fa: 082a 0001 000f btst #1,%a2@(15)
32400: 670c beqs 3240e <irq_enter_rcu+0x30>
32402: 2028 000c movel %a0@(12),%d0
32406: 2028 000c movel %a0@(12),%d0
3240a: 2028 000c movel %a0@(12),%d0
3240e: 4e5e unlk %fp
32410: 4e75 rts