Re: [Question] debugging VM cpu hotplug (#GP -> #DF) which results in reset

Alexander Mikhalitsyn <alexander@xxxxxxxxxxxxx> · Wed, 15 Jun 2022 22:47:57 +0300

Dear Sean,

Thanks a lot for your answer!

On Wed, Jun 15, 2022 at 6:00 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Wed, Jun 15, 2022, Alexander Mikhalitsyn wrote:
> > Dear friends,
> >
> > I'm sorry for disturbing you but I've getting stuck with debugging KVM
> > problem and looking for an advice. I'm working mostly on kernel
> > containers/CRIU and am newbie with KVM so, I believe that I'm missing
> > something very simple.
> >
> > My case:
> > - AMD EPYC 7443P 24-Core Processor (Milan family processor)
> > - OpenVZ kernel (based on RHEL7 3.10.0-1160.53.1) on the Host Node (HN)
> > - Qemu/KVM VM (8 vCPU assigned) with many different kernels from 3.10.0-1160 RHEL7 to mainline 5.18
> >
> > Reproducer (run inside VM):
> > echo 0 > /sys/devices/system/cpu/cpu3/online
> > echo 1 > /sys/devices/system/cpu/cpu3/online <- got reset here
> >
> > *Not* reproducible on:
> > - any Intel which we tried
> > - AMD EPYC 7261 (Rome family)
>
> Hmm, given that Milan is problematic but Rome isn't, that implies the bug is related
> to a feature that's new in Milan.  PCID is the one that comes to mind, and IIRC there
> were issues with PCID (or INVCPID?) in various kernels when running on Milan.
>
> Can you try hiding PCID and INVPCID from the guest?

Yep, I've tried to disable PCID and INVPCID features by nopcid and
noinvpcid kernel cmdline flags.
noinvpcid not effects on the problem, but nopcid does! Fantastic!

Of course, masking CPU feature from qemu side is also works:
  <cpu mode='host-model' check='partial'>
    <feature policy='disable' name='pcid'/>
  </cpu>

Now, thanks to your advice, I will try to understand why the PCID
feature breaks VMs. I see
that we've some support for this feature in our host kernel (based on
RHEL7 3.10.0-1160.53.1), probably
We have some bugs or are not handling something PCID-related from the KVM side.

Thanks again, I couldn't have pulled this off without your advice, Sean.

>
> > - without KVM (on Host)
>
> ...
>
> > ==== trace-cmd record -b 20000 -e kvm:kvm_cr -e kvm:kvm_userspace_exit -e probe:* =====
> >
> >              CPU-1834  [003] 69194.833364: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
> >              CPU-1838  [000] 69194.834177: kvm_multiple_exception_L9: (ffffffff814313c6) vcpu=0xffff93ee9a528000
> >              CPU-1838  [000] 69194.834180: kvm_multiple_exception_L41: (ffffffff81431493) vcpu=0xffff93ee9a528000 exception=0xd000001 has_error=0x0 nr=0xd error_code=0x0 has_payload=0x0
> >              CPU-1838  [000] 69194.834195: kvm_multiple_exception_L9: (ffffffff814313c6) vcpu=0xffff93ee9a528000
> >              CPU-1838  [000] 69194.834196: kvm_multiple_exception_L41: (ffffffff81431493) vcpu=0xffff93ee9a528000 exception=0x8000100 has_error=0x0 nr=0x8 error_code=0x0 has_payload=0x0
> >              CPU-1838  [000] 69194.834200: shutdown_interception_L8: (ffffffff8146e4a0)
>
> If you can modify the host kernel, throwing a WARN in kvm_multiple_exception() should
> pinpoint the source of the #GP.  Though you may get unlucky and find that KVM is just
> reflecting an intercepted a #GP that was first "injected" by hardware.  Note that this
> could spam the log if KVM is injecting a large number of #GPs.
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9cea051ca62e..19d959bf97cc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -612,6 +612,8 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
>         u32 prev_nr;
>         int class1, class2;
>
> +       WARN_ON(nr == GP_VECTOR);
> +
>         kvm_make_request(KVM_REQ_EVENT, vcpu);
>
>         if (!vcpu->arch.exception.pending && !vcpu->arch.exception.injected) {
>

Thanks! I'll try to play with that.

Best regards,
Alex