Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Finn,

On 08/09/21 11:50, Finn Thain wrote:
On Tue, 7 Sep 2021, Michael Schmitz wrote:

Does anyone know what causes this?

Our practice to run interrupt handlers at the IPL of the current
interrupt under service, instead of disabling local interrupts for the
duration of the handler?


Lock contention will happen anyway.

If using spin_trylock() outside of "atomic" context was a bug (really?)
then it can't be used here.

add_interrupt_randomness() relies on interrupts being disabled in __handle_irq_event_percpu(), so this check assumes atomic context.

Our definition of irqs_disabled() invalidates that assumption.

Perhaps add_interrupt_randomness() should use the lock in irq mode, like
the rest of drivers/char/random.c does.

That might deadlock on SMP when a task reading from the random pool holding the lock executes on the same CPU as the interrupt.

In the UP case, the spinlock is optimized away, and other users taking the lock also disable interrupts as you said. That ought to protect against re-entering _mix_pool_bytes(), unless we're re-entering from another, higher priority interrupt. We either need to disable interrupts before entering _mix_pool_bytes() on UP, or treat this spin_trylock() as real lock operation (not optimized away).

I think this is something that should be discussed with Ted Ts'o.


In a related case, I've managed to swap my 'resume_userspace' format error for a nice 'illegal instruction' format error apparently caused by an invalid function pointer in __handle_irq_event_percpu(), just by disabling all interrupts upon entering the auto_inthandler and user_inthandler exception handlers. This bug is quite readily reproduced by running your kernel_coverage.sh script in a loop (panics on the first stress test on the second pass):

Stress run 2
Logging to stress-ng-20210908-0838.log
./kernel-coverage.sh: line 272: lcov: command not found
running --fork 1 --fork-vm -t 60 --timestamp --no-rand-seed --times
stress-ng: 08:40:08.70 info:  [1914] setting to a 60 second run per stressor
stress-ng: 08:40:08.82 info:  [1914] dispatching hogs: 1 fork
packet_write_wait: Connection to 10.1.1.4 port 22: Broken pipe

Why disabling interrupts during interrupt processing would make matters worse doesn't make any sense to me...

Cheers,

	Michael



[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux