On 21/09/23 17:59, Eric DeVolder wrote: > The design decision to use the atomic lock is described in the comment > from kexec_internal.h, cited above. However, examining the code of > __crash_kexec(): > > if (kexec_trylock()) { > if (kexec_crash_image) { > ... > } > kexec_unlock(); > } > > reveals that the use of kexec_trylock() here is actually a "best effort" > due to the atomic lock. This atomic lock, prior to crash hotplug, > would almost always be assured (another kexec syscall could hold the lock > and prevent this, but that is about it). > > So at the point where the capture kernel would be invoked, if the lock > is not obtained, then kdump doesn't occur. > > It is possible to instead use a mutex with proper waiting, and utilize > mutex_trylock() as the "best effort" in __crash_kexec(). The use of a > mutex then avoids all the lock acquisition problems that were revealed > by the crash hotplug activity. > @Dave thanks for the Cc, I'd have missed this otherwise. Prior to the atomic thingie, we actually had a mutex and did mutex_trylock() in __crash_kexec(). I'm a bit confused as this looks like a revert of 05c6257433b7 ("panic, kexec: make __crash_kexec() NMI safe") with just the helpers kept in - this doesn't seem to address any of the original issues regarding NMIs? Sebastian raised some good points in [1] regarding these issues. The main hurdle pointed out there is, if we end up in the slowpath during the unlock, then we can can up acquiring the ->wait_lock which isn't NMI safe. This is even worse on PREEMPT_RT, as both trylock and the unlock can end up acquiring the ->wait_lock. [1]: https://lore.kernel.org/all/YqyZ%2FUf14qkYtMDX@xxxxxxxxxxxxx/ _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec