Michael Holzheu <holzheu at linux.vnet.ibm.com> writes: > Hello Vivek, > > In our tests we ran into the following scenario: > > Two CPUs have called panic at the same time. The first CPU called > crash_kexec() and the second CPU called smp_send_stop() in panic() > before crash_kexec() finished on the first CPU. So the second CPU > stopped the first CPU and therefore kdump failed. > > 1st CPU: > panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump > > 2nd CPU: > panic()->crash_kexec()->kexec_mutex already held by 1st CPU > ->smp_send_stop()-> stop CPU 1 (stop kdump) > > How should we fix this problem? One possibility could be to do > smp_send_stop() before we call crash_kexec(). > > What do you think? smp_send_stop is insufficiently reliable to be used before crash_kexec. My first reaction would be to test oops_in_progress and wait until oops_in_progress == 1 before calling smp_send_stop. Eric