Re: WARNING in kvm_arch_vcpu_ioctl_run (3)

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 27 Jun 2022 20:08:39 +0000

On Wed, Jun 22, 2022, Tetsuo Handa wrote:
> On 2018/03/28 16:29, Wanpeng Li wrote:
> >> syzbot dashboard link:
> >> https://syzkaller.appspot.com/bug?extid=760a73552f47a8cd0fd9
> >>
> > Maybe the same as this one. https://lkml.org/lkml/2018/3/21/174 Paolo,
> > any idea against my analysis?
> 
> No progress for 4 years. Did somebody check Wanpeng's analysis ?

The most recent failure is a different bug, the splat Wanpeng debugged requires
unrestricted guest to be disabled, whereas this does not.  Somewhat of a side
topic, if the old bug still exists (the syzkaller reproducer fails with invalid
guest state, so it's not clear whether or not the bug is still a problem),
I suspect this hack-a-fix would handle the Real Mode injection case:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 735543df829a..58801d3888c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8209,7 +8209,7 @@ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
        ctxt->_eip = ctxt->eip + inc_eip;
        ret = emulate_int_real(ctxt, irq);

-       if (ret != X86EMUL_CONTINUE) {
+       if (ret != X86EMUL_CONTINUE || vcpu->mmio_needed) {
                kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
        } else {
                ctxt->eip = ctxt->_eip;

If I ever have time and/or get bored, I'll try to repro the realmode bug unless
someone beats me to it.

> Since I'm not familiar with KVM, my questions from different direction...
> 
> 
> 
> syzbot is hitting WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed) added by
> commit 716d51abff06f484 ("KVM: Provide userspace IO exit completion callback")
> due to vcpu->mmio_needed == true.
> 
> Question 1: what is the intent of checking for vcpu->mmio_needed == false?

It's a sanity check to detect KVM bugs.  If vcpu->mmio_needed is true, KVM needs
to exit to userspace to complete the MMIO operation.  On that exit to userspace,
KVM is supposed to also set a callback to essentially acknowledge that the MMIO
completed.

The issue in this bug is that after setting vcpu->mmio_needed, KVM detects and
injects an exception.  Because of how KVM handles MMIO, unlike MMIO reads, MMIO
writes don't immediately stop emulation.  While odd, it should work because MMIO
writes shouldn't be processed until after all fault checks have passed.  The
underlying bug is that LTR emulation has incorrect ordering and checks for a
non-canonical base _after_ marking the TSS as busy (which triggers MMIO).

So as much as I want to suppress this type of warn by clearing vcpu->mmio_needed
when injecting an exception, I suspect playing whack-a-mole is the right approach
because all those moles are likely bugs :-(  Though one thing we can do is change
the WARN_ON() to a WARN_ON_ONCE() so that kernels outside of panic_on_warn=1 won't
blow up on a buggy/malicious userspace.

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 39ea9138224c..09e4b67b881f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1699,16 +1699,6 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
        case VCPU_SREG_TR:
                if (seg_desc.s || (seg_desc.type != 1 && seg_desc.type != 9))
                        goto exception;
-               if (!seg_desc.p) {
-                       err_vec = NP_VECTOR;
-                       goto exception;
-               }
-               old_desc = seg_desc;
-               seg_desc.type |= 2; /* busy */
-               ret = ctxt->ops->cmpxchg_emulated(ctxt, desc_addr, &old_desc, &seg_desc,
-                                                 sizeof(seg_desc), &ctxt->exception);
-               if (ret != X86EMUL_CONTINUE)
-                       return ret;
                break;
        case VCPU_SREG_LDTR:
                if (seg_desc.s || seg_desc.type != 2)
@@ -1749,6 +1739,15 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
                                ((u64)base3 << 32), ctxt))
                        return emulate_gp(ctxt, 0);
        }
+
+       if (seg == VCPU_SREG_TR) {
+               old_desc = seg_desc;
+               seg_desc.type |= 2; /* busy */
+               ret = ctxt->ops->cmpxchg_emulated(ctxt, desc_addr, &old_desc, &seg_desc,
+                                                 sizeof(seg_desc), &ctxt->exception);
+               if (ret != X86EMUL_CONTINUE)
+                       return ret;
+       }
 load:
        ctxt->ops->set_segment(ctxt, selector, &seg_desc, base3, seg);
        if (desc)


> If we run a reproducer provided by syzbot, we can observe that mutex_unlock(&vcpu->mutex)
> in kvm_vcpu_ioctl() is called with vcpu->mmio_needed == true.
> 
> Question 2: Is kvm_vcpu_ioctl() supposed to leave with vcpu->mmio_needed == false?
> In other words, is doing
> 
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4104,6 +4104,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  		r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
>  	}
>  out:
> +	WARN_ON_ONCE(vcpu->mmio_needed);
>  	mutex_unlock(&vcpu->mutex);
>  	kfree(fpu);
>  	kfree(kvm_sregs);
> 
> appropriate?

It's not appropriate, mmio_needed is actually supposed to be accompanied by a
exit from kvm_vcpu_ioctl() to userspace.