Re: rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] Running fedora xen on top of KVM?

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Thu, 17 Sep 2015 13:23:31 -0700

On Thu, Sep 17, 2015 at 1:10 PM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
> On Wed, Sep 16, 2015 at 06:39:03PM -0400, Cole Robinson wrote:
>> On 09/16/2015 05:08 PM, Konrad Rzeszutek Wilk wrote:
>> > On Wed, Sep 16, 2015 at 05:04:31PM -0400, Cole Robinson wrote:
>> >> On 09/16/2015 04:07 PM, M A Young wrote:
>> >>> On Wed, 16 Sep 2015, Cole Robinson wrote:
>> >>>
>> >>>> Unfortunately I couldn't get anything else extra out of xen using any of these
>> >>>> options or the ones Major recommended... in fact I couldn't get anything to
>> >>>> the serial console at all. console=con1 would seem to redirect messages since
>> >>>> they wouldn't show up on the graphical display, but nothing went to the serial
>> >>>> log. Maybe I'm missing something...
>> >>>
>> >>> That should be console=com1 so you have a typo either in this message or
>> >>> in your tests.
>> >>>
>> >>
>> >> Yeah that was it :/ So here's the crash output use -cpu host:
>> >>
>> >> - Cole
>> >>
>>
>> <snip>
>>
>> >> about to get started...
>> >> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] on
>> >> VCPU 0 [ec=0000]
>> >> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023a5d3
>> >> create_bounce_frame+0x12b/0x13a
>> >> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> >> (XEN) ----[ Xen-4.5.1  x86_64  debug=n  Not tainted ]----
>> >> (XEN) CPU:    0
>> >> (XEN) RIP:    e033:[<ffffffff810032b0>]
>> >
>> > That is the Linux kernel EIP. Can you figure out what is at ffffffff810032b0 ?
>> >
>> > gdb vmlinux and then
>> > x/20i 0xffffffff810032b0
>> >
>> > can help with that.
>> >
>>
>> Updated to the latest kernel 4.1.6-201.fc22.x86_64. Trace is now:
>>
>> about to get started...
>> (XEN) traps.c:459:d0v0 Unhandled general protection fault fault/trap [#13] on
>> VCPU 0 [ec=0000]

What exactly does this mean?

>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023a5d3
>> create_bounce_frame+0x12b/0x13a
>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>> (XEN) ----[ Xen-4.5.1  x86_64  debug=n  Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e033:[<ffffffff810031f0>]
>> (XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest
>> (XEN) rax: 0000000000000015   rbx: ffffffff81c03e1c   rcx: 00000000c0010112
>> (XEN) rdx: 0000000000000001   rsi: ffffffff81c03e1c   rdi: 00000000c0010112
>> (XEN) rbp: ffffffff81c03df8   rsp: ffffffff81c03da0   r8:  ffffffff81c03e28
>> (XEN) r9:  ffffffff81c03e2c   r10: 0000000000000000   r11: 00000000ffffffff
>> (XEN) r12: ffffffff81d25a60   r13: 0000000004000000   r14: 0000000000000000
>> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000000406f0
>> (XEN) cr3: 0000000075c0b000   cr2: 0000000000000000
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
>> (XEN) Guest stack trace from rsp=ffffffff81c03da0:
>> (XEN)    00000000c0010112 00000000ffffffff 0000000000000000 ffffffff810031f0
>> (XEN)    000000010000e030 0000000000010082 ffffffff81c03de0 000000000000e02b
>> (XEN)    0000000000000000 000000000000000c ffffffff81c03e1c ffffffff81c03e48
>> (XEN)    ffffffff8102a7a4 ffffffff81c03e48 ffffffff8102aa3b ffffffff81c03e48
>> (XEN)    cf1fa5f5e026f464 0000000001000000 ffffffff81c03ef8 0000000004000000
>> (XEN)    0000000000000000 ffffffff81c03e58 ffffffff81d5d142 ffffffff81c03ee8
>> (XEN)    ffffffff81d58b56 0000000000000000 0000000000000000 ffffffff81c03e88
>> (XEN)    ffffffff810f8a39 ffffffff81c03ee8 ffffffff81798b13 ffffffff00000010
>> (XEN)    ffffffff81c03ef8 ffffffff81c03eb8 cf1fa5f5e026f464 ffffffff81f1de9c
>> (XEN)    ffffffffffffffff 0000000000000000 ffffffff81df7920 0000000000000000
>> (XEN)    0000000000000000 ffffffff81c03f28 ffffffff81d51c74 cf1fa5f5e026f464
>> (XEN)    0000000000000000 ffffffff81c03f60 ffffffff81c03f5c 0000000000000000
>> (XEN)    0000000000000000 ffffffff81c03f38 ffffffff81d51339 ffffffff81c03ff8
>> (XEN)    ffffffff81d548b1 0000000000000000 00600f1200000000 0000000100000800
>> (XEN)    0300000100000032 0000000000000005 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0f00000060c0c748 ccccccccccccc305 cccccccccccccccc cccccccccccccccc
>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>
>>
>> gdb output:
>>
>> (gdb) x/20i 0xffffffff810031f0
>>    0xffffffff810031f0 <xen_read_msr_safe+16>: rdmsr
>
> Fantastic! So we have some rdmsr that makes KVM inject an
> GP.

What's the scenario?  Is this Xen on KVM?

Why didn't the guest print anything?

Is the issue here that the guest died due to failure to handle an
RDMSR failure or did the *hypervisor* die?

It looks like null_trap_bounce is returning true, which suggests that
the failure is happening before the guest sets up exception handling.

>
> Looking at the stack you have some other values:
> ffffffff81c03de0, ffffffff81c03e1c .. they should correspond
> to other functions calling this one. If you do 'nm --defined vmlinux | grep ffffffff81c03e1'
> that should give an idea where they are. Or use 'gdb'.
>
> That will give us an stack - and we can find what type of MSR
> this is. Oh wait, it is on the registers: 00000000c0010112
>
> Ok, so where in the code is that MSR ah, that looks to be:
>  #define MSR_K8_TSEG_ADDR                0xc0010112
>
> which is called at bsp_init_amd.
>
> I think the problem here is that we are calling the
> 'safe' variant of MSR but we still get an injected #GP and
> don't expect that.
>
> I am not really sure what the expected outcome should be here.
>
> CC-ing xen-devel, KVM folks, and Andy who has been looking
> in mucking around in the _safe* pvops.

It's too early of a failure, I think.

Cc: Borislav.  Is TSEG guaranteed to exist?  Can we defer that until
we have exception handling working?  Do we need to rig up exception
handling so that it works earlier (e.g. in early_trap_init, which is
presumably early enough)?  Or is this just a KVM and/or Xen bug.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html