[Xen-devel] [PATCHv10 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

andrew.cooper3@xxxxxxxxxx (Andrew Cooper) · Thu, 7 Nov 2013 21:57:05 +0000

On 07/11/2013 21:41, Daniel Kiper wrote:
> On Thu, Nov 07, 2013 at 09:25:33PM +0000, Andrew Cooper wrote:
>> On 07/11/13 21:16, Daniel Kiper wrote:
>>> On Wed, Nov 06, 2013 at 02:49:37PM +0000, David Vrabel wrote:
>>>> The series (for Xen 4.4) improves the kexec hypercall by making Xen
>>>> responsible for loading and relocating the image.  This allows kexec
>>>> to be usable by pv-ops kernels and should allow kexec to be usable
>>>> from a HVM or PVH privileged domain.
>>>>
>>>> I have now tested this with a Linux kernel image using the VGA console
>>>> which was what was causing problems in v9 (this turned out to be a
>>>> kexec-tools bug).
>>>>
>>>> The required patch series for kexec-tools will be posted shortly and
>>>> are available from the xen-v7 branch of:
>>> In general it works. However, quite often I am not able to execute panic
>>> kernel. Machine hangs with following message:
>>>
>>> (XEN) Domain 0 crashed: Executing crash image
>>>
>>> gdb shows:
>>>
>>> (gdb) bt
>>> #0  0xffff82d0801a0092 in do_nmi_crash (regs=<optimized out>) at crash.c:113
>>> #1  0xffff82d0802281d9 in nmi_crash () at entry.S:666
>>> #2  0x0000000000000000 in ?? ()
>>> (gdb)
>>>
>>> Especially second bt line scares me... ;-)))
>> Why? This is completely normal.  If you look in crash.c at that line, it
>> is a for (;;) halt(); loop
> I thought more about this:
>
> #1  0xffff82d0802281d9 in nmi_crash () at entry.S:666
>
> Look at the end of this line... ;-)))

Which line and what about it?  In current master, that is a SAVE_ALL,
but as the call to do_nmi_crash has happened, I presume
0xffff82d0802281d9 is a ud2 instruction in your tree?

>
>> How are you hooking gdb up?
> I am doing tests in QEMU and using QEMU's -gdb option.
>
>>> I have not been able to identify why NMI was activated because
>>> stack is completely cleared. I tried to record execution in gdb
>>> but it stops with following message:
>> NMIs are used for cpu shootdown of the non-crashing cpus.  Again, this
>> is not touched by the series.
> Ahh... It makes sens. However, why machine hangs at this stage? Hmmm...
> CPU sending NMIs receives one and instead of ignoring it halts itself?
>
> Daniel

No - there is very clear protection from racing down the crash path. 
The crashing CPU forces all other cpus into nmi_crash(), where they will
stay until reset.

It is the one cpu which is not executing nmi_crash() which will end up
executing the crash image.

~Andrew