Re: [PATCH RESEND v2 1/2] KVM: VMX: use kvm_fast_pio_in for handling IN I/O

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Mon, 11 Mar 2019 14:07:32 -0700

On Mon, Mar 11, 2019 at 10:07:41AM -0700, Jim Mattson wrote:
> On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson
> <sean.j.christopherson@xxxxxxxxx> wrote:
> >
> > Fast emulation of processor I/O for IN was disabled on x86 (both VMX
> > and SVM) some years ago due to a buggy implementation.  The addition
> > of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast
> > emulation of IN.  Piggyback SVM's work and use kvm_fast_pio_in() on
> > VMX instead of performing full emulation of IN.
> >
> > Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> 
> This commit changes the userspace API in a significant way. Before
> this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO
> and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in'
> instruction. Now, the guest %rip will be advanced to the next
> instruction before the exit to userspace. This seems like a bug, but
> I'm not entirely sure, because (1) the details of this API aren't
> documented anywhere, and (2) the behavior for 'in' now matches the
> behavior for 'out' (but 'ins' and 'outs' still retain the original
> 'in' behavior). Moreover, Paolo has checked in a self-test that
> depends on the new behavior. So, maybe I'm wrong in thinking that the
> guest %rip should point to the I/O instruction when kvm exits to
> userspace with exit_reason KVM_EXIT_IO?

Yuck.  Semantically I think it's more correct for %rip to point at the
instruction than exited, e.g. if userspace explodes then it should be
able to yell about the IN/OUT instruction, not some random instruction
following the I/O.

However, after a blaming my way down the rabbit hole, updating %rip at
decode was a deliberate change:

  commit 0967b7bf1c22b55777aba46ff616547feed0b141
  Author: Avi Kivity <avi@xxxxxxxxxxxx>
  Date:   Sat Sep 15 17:34:36 2007 +0300

      KVM: Skip pio instruction when it is emulated, not executed

      If we defer updating rip until pio instructions are executed, we have a
      problem with reset:  a pio reset updates rip, and when the instruction
      completes we skip the emulated instruction, pointing rip somewhere completely
      unrelated.

      Fix by updating rip when we see decode the instruction, not after emulation.

      Signed-off-by: Avi Kivity <avi@xxxxxxxxxxxx>

Which makes sense, e.g. on reset via OUT 92h or CF9h, Qemu doesn't announce
to KVM that it's doing a reset, it just stuffs register state and starts
running again.

So, fixing this for IN is trivial.  One idea for fixing OUT would be to
not skip the instruction in the complete_userspace_io() callback if the
register state implies a reset and the OUT port is a known reset port.

Fun side topic: since OUTSB still increments %rip post-emulate, you can
reproduce the bug fixed by Avi by coercing your guest kernel into issuing
an I/O reset via an OUTSB.