Re: [PATCH RESEND v2 1/2] KVM: VMX: use kvm_fast_pio_in for handling IN I/O

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Mon, 11 Mar 2019 14:33:26 -0700

On Mon, Mar 11, 2019 at 02:07:32PM -0700, Sean Christopherson wrote:
> On Mon, Mar 11, 2019 at 10:07:41AM -0700, Jim Mattson wrote:
> > On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson
> > <sean.j.christopherson@xxxxxxxxx> wrote:
> > >
> > > Fast emulation of processor I/O for IN was disabled on x86 (both VMX
> > > and SVM) some years ago due to a buggy implementation.  The addition
> > > of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast
> > > emulation of IN.  Piggyback SVM's work and use kvm_fast_pio_in() on
> > > VMX instead of performing full emulation of IN.
> > >
> > > Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > 
> > This commit changes the userspace API in a significant way. Before
> > this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO
> > and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in'
> > instruction. Now, the guest %rip will be advanced to the next
> > instruction before the exit to userspace. This seems like a bug, but
> > I'm not entirely sure, because (1) the details of this API aren't
> > documented anywhere, and (2) the behavior for 'in' now matches the
> > behavior for 'out' (but 'ins' and 'outs' still retain the original
> > 'in' behavior). Moreover, Paolo has checked in a self-test that
> > depends on the new behavior. So, maybe I'm wrong in thinking that the
> > guest %rip should point to the I/O instruction when kvm exits to
> > userspace with exit_reason KVM_EXIT_IO?
> 
> Yuck.  Semantically I think it's more correct for %rip to point at the
> instruction than exited, e.g. if userspace explodes then it should be
> able to yell about the IN/OUT instruction, not some random instruction
> following the I/O.
> 
> However, after a blaming my way down the rabbit hole, updating %rip at
> decode was a deliberate change:
> 
>   commit 0967b7bf1c22b55777aba46ff616547feed0b141
>   Author: Avi Kivity <avi@xxxxxxxxxxxx>
>   Date:   Sat Sep 15 17:34:36 2007 +0300
> 
>       KVM: Skip pio instruction when it is emulated, not executed
> 
>       If we defer updating rip until pio instructions are executed, we have a
>       problem with reset:  a pio reset updates rip, and when the instruction
>       completes we skip the emulated instruction, pointing rip somewhere completely
>       unrelated.
> 
>       Fix by updating rip when we see decode the instruction, not after emulation.
> 
>       Signed-off-by: Avi Kivity <avi@xxxxxxxxxxxx>
> 
> Which makes sense, e.g. on reset via OUT 92h or CF9h, Qemu doesn't announce
> to KVM that it's doing a reset, it just stuffs register state and starts
> running again.
> 
> So, fixing this for IN is trivial.  One idea for fixing OUT would be to
> not skip the instruction in the complete_userspace_io() callback if the
> register state implies a reset and the OUT port is a known reset port.

A better idea would be to skip the %rip update if it's already dirty.
I think that would work even for things like INIT-SIPI-SIPI reset as
kvm_apic_accept_events() will mark %rip as dirty before the callback
is invoked.

I'll cook up a patch and see what happens...

> Fun side topic: since OUTSB still increments %rip post-emulate, you can
> reproduce the bug fixed by Avi by coercing your guest kernel into issuing
> an I/O reset via an OUTSB.