On Wed, 2007-04-25 at 14:33 +0200, Andi Kleen wrote: > On Wednesday 25 April 2007 13:51:12 Fernando Luis V?zquez Cao wrote: > > Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is > > NMI_VECTOR to avoid potential hangups in the event of crash when kdump > > tries to stop the other CPUs. > > But what happens then when this fails? Won't this give another hang? > Have you tested this? In kdump the crashing CPU (i.e. the CPU that called crash_kexec) is the one in charge of rebooting into and executing the dump capture kernel. But before doing this it attempts to stop the other CPUs sending a IPI using NMI_VECTOR as the vector. The problem is that sometimes delivery seems to fail and the crashing CPU gets stuck waiting for the ICR status bit to be cleared, which will never happen. With this patch, when safe_apic_wait_icr_idle times out the CPU will continue executing and try to hand over control to the dump capture kernel as usual. After applying this patch I have not seen hangs in the reboot path to second kernel showing the symptoms mentioned before, but perhaps I am just being lucky and there is something else going on.