Re: [PATCH 1/2] kvm/x86: Hyper-V based guest crash data handling

Andrey Smetanin <asmetanin@xxxxxxxxxxxxx> · Mon, 15 Jun 2015 13:21:46 +0300



Hi Peter,
thank you for comments. See answers below.
On Fri, 2015-06-12 at 16:03 -0700, Peter Hornyack wrote:
> Hi Denis, Andrey, I have a few comments and questions. (re-sending in
> plain-text mode, apologies for sending twice.)
> 
> On Thu, Jun 11, 2015 at 6:18 AM, Denis V. Lunev <den@xxxxxxxxxx> wrote:
> > From: Andrey Smetanin <asmetanin@xxxxxxxxxxxxx>
> >
> > Windows 2012 guests can notify hypervisor about occurred guest crash
> > (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does
> > handling of this MSR's by KVM and sending notification to user space that
> > allows to gather Windows guest crash dump by QEMU/LIBVIRT.
> >
> > The idea is to provide functionality equal to pvpanic device without
> > QEMU guest agent for Windows.
> >
> > The idea is borrowed from Linux HyperV bus driver and validated against
> > Windows 2k12.
> >
> > Signed-off-by: Andrey Smetanin <asmetanin@xxxxxxxxxxxxx>
> > Signed-off-by: Denis V. Lunev <den@xxxxxxxxxx>
> > CC: Gleb Natapov <gleb@xxxxxxxxxx>
> > CC: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > ---
> >  arch/x86/include/uapi/asm/hyperv.h | 10 +++++
> >  arch/x86/kvm/Makefile              |  2 +-
> >  arch/x86/kvm/mshv.c                | 84 ++++++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/mshv.h                | 32 +++++++++++++++
> >  arch/x86/kvm/x86.c                 | 25 ++++++++++++
> >  include/linux/kvm_host.h           | 17 ++++++++
> >  include/uapi/linux/kvm.h           | 11 +++++
> >  7 files changed, 180 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/x86/kvm/mshv.c
> >  create mode 100644 arch/x86/kvm/mshv.h
> >
> > diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
> > index ce6068d..25f3064 100644
> > --- a/arch/x86/include/uapi/asm/hyp	erv.h
> > +++ b/arch/x86/include/uapi/asm/hyperv.h
> > @@ -199,6 +199,16 @@
> >  #define HV_X64_MSR_STIMER3_CONFIG              0x400000B6
> >  #define HV_X64_MSR_STIMER3_COUNT               0x400000B7
> >
> > +
> > +/* Hypev-V guest crash notification MSR's */
> > +#define HV_X64_MSR_CRASH_P0                    0x40000100
> > +#define HV_X64_MSR_CRASH_P1                    0x40000101
> > +#define HV_X64_MSR_CRASH_P2                    0x40000102
> > +#define HV_X64_MSR_CRASH_P3                    0x40000103
> > +#define HV_X64_MSR_CRASH_P4                    0x40000104
> > +#define HV_X64_MSR_CRASH_CTL                   0x40000105
> > +#define HV_CRASH_CTL_CRASH_NOTIFY              (1ULL << 63)
> > +
> >  #define HV_X64_MSR_HYPERCALL_ENABLE            0x00000001
> >  #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT        12
> >  #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK \
> > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> > index 16e8f96..b1ec24d 100644
> > --- a/arch/x86/kvm/Makefile
> > +++ b/arch/x86/kvm/Makefile
> > @@ -12,7 +12,7 @@ kvm-y                 += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
> >  kvm-$(CONFIG_KVM_ASYNC_PF)     += $(KVM)/async_pf.o
> >
> >  kvm-y                  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
> > -                          i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
> > +                          i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mshv.o
> >  kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)    += assigned-dev.o iommu.o
> >  kvm-intel-y            += vmx.o
> >  kvm-amd-y              += svm.o
> > diff --git a/arch/x86/kvm/mshv.c b/arch/x86/kvm/mshv.c
> > new file mode 100644
> > index 0000000..ad367c44
> > --- /dev/null
> > +++ b/arch/x86/kvm/mshv.c
> > @@ -0,0 +1,84 @@
> > +/*
> > + * KVM Microsoft Hyper-V extended paravirtualization
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
> > + *
> > + * Copyright (C) 2015 Andrey Smetanin <asmetanin@xxxxxxxxxxxxx>
> > + *
> > + * Authors: Andrey Smetanin asmetanin@xxxxxxxxxxxxx
> > + */
> > +
> > +#include <linux/kvm_host.h>
> > +#include "mshv.h"
> > +
> > +int kvm_mshv_ctx_create(struct kvm *kvm)
> > +{
> > +       struct kvm_mshv_ctx *ctx;
> > +
> > +       ctx = kzalloc(sizeof(struct kvm_mshv_ctx), GFP_KERNEL);
> > +       if (!ctx)
> > +               return -ENOMEM;
> > +
> > +       ctx->kvm = kvm;
> > +       atomic_set(&ctx->crash_pending, 0);
> > +       kvm->mshv_ctx = ctx;
> > +       return 0;
> > +}
> > +
> > +void kvm_mshv_ctx_destroy(struct kvm *kvm)
> > +{
> > +       kfree(kvm->mshv_ctx);
> > +}
> > +
> > +int kvm_mshv_msr_get_crash_ctl(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
> > +{
> > +       struct kvm_mshv_ctx *ctx = kvm_vcpu_get_mshv_ctx(vcpu);
> > +
> > +       atomic_set(&ctx->crash_pending, 1);
> 
> Can you explain why crash_pending is needed?
> 
> Do you know what the Windows guest behavior is here? From my reading
> of the Hyper-V TLFS 4.0, the guest will read HV_X64_MSR_CRASH_CTL at
> some point (not necessarily at the time of the crash, potentially at
> boot time?) to determine the crash actions that the hypervisor
> supports. I'm not sure why kvm needs to remember any state on reads
> from HV_X64_MSR_CRASH_CTL.
Thank you for notice, I agree that crash_pending is useless. We'll drop
it and will send new version of patch.
> 
>  > +
> > +       /* Response that KVM ready to receive crash data */
> > +       *pdata = HV_CRASH_CTL_CRASH_NOTIFY;
> 
> The TLFS says that CrashNotify is the only current action that is
> supported on a guest crash, but other actions may be supported in the
> future. I suggest defining something like
> HV_X64_MSR_CRASH_CTL_CONTENTS (see below), which will only have
> HV_CRASH_CTL_CRASH_NOTIFY set for now but may have other bits set in
> the future, and then setting *pdata = HV_X64_MSR_CRASH_CTL_CONTENTS
> here.
> 
Ok, will do
> > +       return 0;
> > +}
> > +
> > +int kvm_mshv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> > +{
> > +       struct kvm_mshv_ctx *ctx = kvm_vcpu_get_mshv_ctx(vcpu);
> > +
> > +       if (atomic_dec_and_test(&ctx->crash_pending)) {
> 
> The TLFS says: "To invoke a supported hypervisor guest crash action, a
> child partition writes to the HV_X64_MSR_CRASH_CTL register,
> specifying the desired action." However, this implementation doesn't
> actually check the data that's written to this register. The condition
> here should check for "data & HV_CRASH_CTL_CRASH_NOTIFY", right?
> 
Agree, will be done in new version of patch
> I'm still not clear on the need for crash_pending here... maybe there
> are concurrency reasons that I haven't thought through.
> 
Will drop it
> > +               pr_debug("vcpu %p 0x%llx 0x%llx 0x%llx 0x%llx 0x%llx",
> > +                        vcpu, ctx->crash_p0, ctx->crash_p1, ctx->crash_p2,
> > +                        ctx->crash_p3, ctx->crash_p4);
> > +
> > +               /* Crash data almost gathered so notify user space */
> > +               kvm_make_request(KVM_REQ_MSHV_CRASH, vcpu);
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +int kvm_mshv_msr_set_crash_data(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> > +{
> > +       struct kvm_mshv_ctx *ctx = kvm_vcpu_get_mshv_ctx(vcpu);
> > +
> > +       switch (msr) {
> > +       case HV_X64_MSR_CRASH_P0:
> > +               ctx->crash_p0 = data;
> > +               return 0;
> > +       case HV_X64_MSR_CRASH_P1:
> > +               ctx->crash_p1 = data;
> > +               return 0;
> > +       case HV_X64_MSR_CRASH_P2:
> > +               ctx->crash_p2 = data;
> > +               return 0;
> > +       case HV_X64_MSR_CRASH_P3:
> > +               ctx->crash_p3 = data;
> > +               return 0;
> > +       case HV_X64_MSR_CRASH_P4:
> > +               ctx->crash_p4 = data;
> > +               return 0;
> > +       default:
> > +               return -EINVAL;
> > +       }
> > +}
> > diff --git a/arch/x86/kvm/mshv.h b/arch/x86/kvm/mshv.h
> > new file mode 100644
> > index 0000000..ce8e7fa
> > --- /dev/null
> > +++ b/arch/x86/kvm/mshv.h
> > @@ -0,0 +1,32 @@
> > +/*
> > + * KVM Microsoft Hyper-V extended paravirtualization
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
> > + *
> > + * Copyright (C) 2015 Andrey Smetanin <asmetanin@xxxxxxxxxxxxx>
> > + *
> > + * Authors: Andrey Smetanin asmetanin@xxxxxxxxxxxxx
> > + */
> > +
> > +#ifndef __ARCH_X86_KVM_MSHV_H__
> > +#define __ARCH_X86_KVM_MSHV_H__
> > +
> 
> I suggest here:
> 
> #define HV_X64_MSR_CRASH_CTL_CONTENTS  \
>         (HV_CRASH_CTL_CRASH_NOTIFY)
> 
> To allow for more crash actions to be added in the future.
> 
Ok, will do
> > +static inline struct kvm_mshv_ctx *kvm_get_mshv_ctx(struct kvm *vm)
> > +{
> > +       return vm->mshv_ctx;
> > +}
> > +
> > +static inline struct kvm_mshv_ctx *kvm_vcpu_get_mshv_ctx(struct kvm_vcpu *vcpu)
> > +{
> > +       return vcpu->kvm->mshv_ctx;
> > +}
> > +
> > +int kvm_mshv_msr_get_crash_ctl(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
> > +int kvm_mshv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u32 msr, u64 data);
> > +int kvm_mshv_msr_set_crash_data(struct kvm_vcpu *vcpu, u32 msr, u64 data);
> > +
> > +int kvm_mshv_ctx_create(struct kvm *kvm);
> > +void kvm_mshv_ctx_destroy(struct kvm *kvm);
> > +
> > +#endif
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index ea306ad..388b58f 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -28,6 +28,7 @@
> >  #include "x86.h"
> >  #include "cpuid.h"
> >  #include "assigned-dev.h"
> > +#include "mshv.h"
> >
> >  #include <linux/clocksource.h>
> >  #include <linux/interrupt.h>
> > @@ -2338,6 +2339,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >                 } else
> >                         return set_msr_hyperv(vcpu, msr, data);
> >                 break;
> > +       case HV_X64_MSR_CRASH_CTL:
> > +               return kvm_mshv_msr_set_crash_ctl(vcpu, msr, data);
> > +       case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
> > +               return kvm_mshv_msr_set_crash_data(vcpu, msr, data);
> >         case MSR_IA32_BBL_CR_CTL3:
> >                 /* Drop writes to this legacy MSR -- see rdmsr
> >                  * counterpart for further detail.
> > @@ -2650,6 +2655,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
> >                 } else
> >                         return get_msr_hyperv(vcpu, msr, pdata);
> >                 break;
> > +       case HV_X64_MSR_CRASH_CTL:
> > +               return kvm_mshv_msr_get_crash_ctl(vcpu, msr, pdata);
> 
> Have the Windows guests that you've tested ever tried to read from the
> data registers HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4? Is there
> any harm in implementing them for reads? The TLFS doesn't seem to say
> if reads are permitted or not. Not terribly important, but might be
> nice to have.
> 
This MSR's are not read by w2k12 guest, but we'll make placeholder
handler that returns error in that case.
> >         case MSR_IA32_BBL_CR_CTL3:
> >                 /* This legacy MSR exists but isn't fully documented in current
> >                  * silicon.  It is however accessed by winxp in very narrow
> > @@ -6280,6 +6287,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >                         vcpu_scan_ioapic(vcpu);
> >                 if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
> >                         kvm_vcpu_reload_apic_access_page(vcpu);
> > +               if (kvm_check_request(KVM_REQ_MSHV_CRASH, vcpu)) {
> > +                       struct kvm_mshv_ctx *ctx = kvm_vcpu_get_mshv_ctx(vcpu);
> > +
> > +                       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> > +                       vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH;
> > +                       vcpu->run->system_event.flags = 0;
> > +                       vcpu->run->system_event.crash.p0 = ctx->crash_p0;
> > +                       vcpu->run->system_event.crash.p1 = ctx->crash_p1;
> > +                       vcpu->run->system_event.crash.p2 = ctx->crash_p2;
> > +                       vcpu->run->system_event.crash.p3 = ctx->crash_p3;
> > +                       vcpu->run->system_event.crash.p4 = ctx->crash_p4;
> > +                       r = 0;
> > +                       goto out;
> > +               }
> >         }
> >
> >         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > @@ -7418,6 +7439,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >         if (type)
> >                 return -EINVAL;
> >
> > +       if (kvm_mshv_ctx_create(kvm))
> > +               return -ENOMEM;
> > +
> >         INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
> >         INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
> >         INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
> > @@ -7484,6 +7508,7 @@ void kvm_arch_sync_events(struct kvm *kvm)
> >
> >  void kvm_arch_destroy_vm(struct kvm *kvm)
> >  {
> > +       kvm_mshv_ctx_destroy(kvm);
> >         if (current->mm == kvm->mm) {
> >                 /*
> >                  * Free memory regions allocated on behalf of userspace,
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index ad45054..83bd7bf 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -134,6 +134,7 @@ static inline bool is_error_page(struct page *page)
> >  #define KVM_REQ_ENABLE_IBS        23
> >  #define KVM_REQ_DISABLE_IBS       24
> >  #define KVM_REQ_APIC_PAGE_RELOAD  25
> > +#define KVM_REQ_MSHV_CRASH        26
> >
> >  #define KVM_USERSPACE_IRQ_SOURCE_ID            0
> >  #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID       1
> > @@ -343,6 +344,21 @@ struct kvm_memslots {
> >         int used_slots;
> >  };
> >
> > +/*
> > + * Ms hyperv paravirt context
> > + */
> > +struct kvm_mshv_ctx {
> > +       struct kvm      *kvm;
> > +       atomic_t        crash_pending;
> > +
> > +       /* Guest crash related parameters */
> > +       u64             crash_p0;
> > +       u64             crash_p1;
> > +       u64             crash_p2;
> > +       u64             crash_p3;
> > +       u64             crash_p4;
> > +};
> > +
> >  struct kvm {
> >         spinlock_t mmu_lock;
> >         struct mutex slots_lock;
> > @@ -395,6 +411,7 @@ struct kvm {
> >  #endif
> >         long tlbs_dirty;
> >         struct list_head devices;
> > +       struct kvm_mshv_ctx *mshv_ctx;
> >  };
> >
> >  #define kvm_err(fmt, ...) \
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 4b60056..12f481b 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -317,8 +317,19 @@ struct kvm_run {
> >                 struct {
> >  #define KVM_SYSTEM_EVENT_SHUTDOWN       1
> >  #define KVM_SYSTEM_EVENT_RESET          2
> > +#define KVM_SYSTEM_EVENT_CRASH          3
> >                         __u32 type;
> >                         __u64 flags;
> > +                       union {
> > +                               struct {
> > +                                       /* Guest crash related parameters */
> > +                                       __u64 p0;
> > +                                       __u64 p1;
> > +                                       __u64 p2;
> > +                                       __u64 p3;
> > +                                       __u64 p4;
> > +                               } crash;
> > +                       };
> >                 } system_event;
> >                 /* KVM_EXIT_S390_STSI */
> >                 struct {
> > --
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Thanks,
> Pete
Thanks,
Andrey


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html