On Mon, Mar 11, 2019 at 10:07:41AM -0700, Jim Mattson wrote: > On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > Fast emulation of processor I/O for IN was disabled on x86 (both VMX > > and SVM) some years ago due to a buggy implementation. The addition > > of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast > > emulation of IN. Piggyback SVM's work and use kvm_fast_pio_in() on > > VMX instead of performing full emulation of IN. > > > > Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > This commit changes the userspace API in a significant way. Before > this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO > and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in' > instruction. Now, the guest %rip will be advanced to the next > instruction before the exit to userspace. This seems like a bug, but > I'm not entirely sure, because (1) the details of this API aren't > documented anywhere, and (2) the behavior for 'in' now matches the > behavior for 'out' (but 'ins' and 'outs' still retain the original > 'in' behavior). Moreover, Paolo has checked in a self-test that > depends on the new behavior. So, maybe I'm wrong in thinking that the > guest %rip should point to the I/O instruction when kvm exits to > userspace with exit_reason KVM_EXIT_IO? Yuck. Semantically I think it's more correct for %rip to point at the instruction than exited, e.g. if userspace explodes then it should be able to yell about the IN/OUT instruction, not some random instruction following the I/O. However, after a blaming my way down the rabbit hole, updating %rip at decode was a deliberate change: commit 0967b7bf1c22b55777aba46ff616547feed0b141 Author: Avi Kivity <avi@xxxxxxxxxxxx> Date: Sat Sep 15 17:34:36 2007 +0300 KVM: Skip pio instruction when it is emulated, not executed If we defer updating rip until pio instructions are executed, we have a problem with reset: a pio reset updates rip, and when the instruction completes we skip the emulated instruction, pointing rip somewhere completely unrelated. Fix by updating rip when we see decode the instruction, not after emulation. Signed-off-by: Avi Kivity <avi@xxxxxxxxxxxx> Which makes sense, e.g. on reset via OUT 92h or CF9h, Qemu doesn't announce to KVM that it's doing a reset, it just stuffs register state and starts running again. So, fixing this for IN is trivial. One idea for fixing OUT would be to not skip the instruction in the complete_userspace_io() callback if the register state implies a reset and the OUT port is a known reset port. Fun side topic: since OUTSB still increments %rip post-emulate, you can reproduce the bug fixed by Avi by coercing your guest kernel into issuing an I/O reset via an OUTSB.