On 09/22/23 at 12:35pm, Eric DeVolder wrote: > > > On 9/22/23 11:28, Valentin Schneider wrote: > > On 21/09/23 17:59, Eric DeVolder wrote: > > > The design decision to use the atomic lock is described in the comment > > > from kexec_internal.h, cited above. However, examining the code of > > > __crash_kexec(): > > > > > > if (kexec_trylock()) { > > > if (kexec_crash_image) { > > > ... > > > } > > > kexec_unlock(); > > > } > > > > > > reveals that the use of kexec_trylock() here is actually a "best effort" > > > due to the atomic lock. This atomic lock, prior to crash hotplug, > > > would almost always be assured (another kexec syscall could hold the lock > > > and prevent this, but that is about it). > > > > > > So at the point where the capture kernel would be invoked, if the lock > > > is not obtained, then kdump doesn't occur. > > > > > > It is possible to instead use a mutex with proper waiting, and utilize > > > mutex_trylock() as the "best effort" in __crash_kexec(). The use of a > > > mutex then avoids all the lock acquisition problems that were revealed > > > by the crash hotplug activity. > > > > > > > @Dave thanks for the Cc, I'd have missed this otherwise. > > > > > > Prior to the atomic thingie, we actually had a mutex and did > > mutex_trylock() in __crash_kexec(). I'm a bit confused as this looks like a > > revert of > > 05c6257433b7 ("panic, kexec: make __crash_kexec() NMI safe") > > with just the helpers kept in - this doesn't seem to address any of the > > original issues regarding NMIs? > > > > Sebastian raised some good points in [1] regarding these issues. > > The main hurdle pointed out there is, if we end up in the slowpath during > > the unlock, then we can can up acquiring the ->wait_lock which isn't NMI > > safe. > > > > This is even worse on PREEMPT_RT, as both trylock and the unlock can end up > > acquiring the ->wait_lock. > > > > [1]: https://lore.kernel.org/all/YqyZ%2FUf14qkYtMDX@xxxxxxxxxxxxx/ > > > Having reviewed the references, it would seem that Baoquan's approach of a new > lock to handle the hotplug activity is the way to go? If so, I have posted a formal one. It's simple and should work to fix the issue. _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec