On 07/11/2013 21:41, Daniel Kiper wrote: > On Thu, Nov 07, 2013 at 09:25:33PM +0000, Andrew Cooper wrote: >> On 07/11/13 21:16, Daniel Kiper wrote: >>> On Wed, Nov 06, 2013 at 02:49:37PM +0000, David Vrabel wrote: >>>> The series (for Xen 4.4) improves the kexec hypercall by making Xen >>>> responsible for loading and relocating the image. This allows kexec >>>> to be usable by pv-ops kernels and should allow kexec to be usable >>>> from a HVM or PVH privileged domain. >>>> >>>> I have now tested this with a Linux kernel image using the VGA console >>>> which was what was causing problems in v9 (this turned out to be a >>>> kexec-tools bug). >>>> >>>> The required patch series for kexec-tools will be posted shortly and >>>> are available from the xen-v7 branch of: >>> In general it works. However, quite often I am not able to execute panic >>> kernel. Machine hangs with following message: >>> >>> (XEN) Domain 0 crashed: Executing crash image >>> >>> gdb shows: >>> >>> (gdb) bt >>> #0 0xffff82d0801a0092 in do_nmi_crash (regs=<optimized out>) at crash.c:113 >>> #1 0xffff82d0802281d9 in nmi_crash () at entry.S:666 >>> #2 0x0000000000000000 in ?? () >>> (gdb) >>> >>> Especially second bt line scares me... ;-))) >> Why? This is completely normal. If you look in crash.c at that line, it >> is a for (;;) halt(); loop > I thought more about this: > > #1 0xffff82d0802281d9 in nmi_crash () at entry.S:666 > > Look at the end of this line... ;-))) Which line and what about it? In current master, that is a SAVE_ALL, but as the call to do_nmi_crash has happened, I presume 0xffff82d0802281d9 is a ud2 instruction in your tree? > >> How are you hooking gdb up? > I am doing tests in QEMU and using QEMU's -gdb option. > >>> I have not been able to identify why NMI was activated because >>> stack is completely cleared. I tried to record execution in gdb >>> but it stops with following message: >> NMIs are used for cpu shootdown of the non-crashing cpus. Again, this >> is not touched by the series. > Ahh... It makes sens. However, why machine hangs at this stage? Hmmm... > CPU sending NMIs receives one and instead of ignoring it halts itself? > > Daniel No - there is very clear protection from racing down the crash path. The crashing CPU forces all other cpus into nmi_crash(), where they will stay until reset. It is the one cpu which is not executing nmi_crash() which will end up executing the crash image. ~Andrew