On Mon, Mar 11, 2019 at 02:07:32PM -0700, Sean Christopherson wrote: > On Mon, Mar 11, 2019 at 10:07:41AM -0700, Jim Mattson wrote: > > On Thu, Mar 8, 2018 at 8:57 AM Sean Christopherson > > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > > Fast emulation of processor I/O for IN was disabled on x86 (both VMX > > > and SVM) some years ago due to a buggy implementation. The addition > > > of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast > > > emulation of IN. Piggyback SVM's work and use kvm_fast_pio_in() on > > > VMX instead of performing full emulation of IN. > > > > > > Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > This commit changes the userspace API in a significant way. Before > > this commit, when kvm exited to userspace with exit_reason KVM_EXIT_IO > > and io.direction KVM_EXIT_IO_IN, the guest %rip pointed to the 'in' > > instruction. Now, the guest %rip will be advanced to the next > > instruction before the exit to userspace. This seems like a bug, but > > I'm not entirely sure, because (1) the details of this API aren't > > documented anywhere, and (2) the behavior for 'in' now matches the > > behavior for 'out' (but 'ins' and 'outs' still retain the original > > 'in' behavior). Moreover, Paolo has checked in a self-test that > > depends on the new behavior. So, maybe I'm wrong in thinking that the > > guest %rip should point to the I/O instruction when kvm exits to > > userspace with exit_reason KVM_EXIT_IO? > > Yuck. Semantically I think it's more correct for %rip to point at the > instruction than exited, e.g. if userspace explodes then it should be > able to yell about the IN/OUT instruction, not some random instruction > following the I/O. > > However, after a blaming my way down the rabbit hole, updating %rip at > decode was a deliberate change: > > commit 0967b7bf1c22b55777aba46ff616547feed0b141 > Author: Avi Kivity <avi@xxxxxxxxxxxx> > Date: Sat Sep 15 17:34:36 2007 +0300 > > KVM: Skip pio instruction when it is emulated, not executed > > If we defer updating rip until pio instructions are executed, we have a > problem with reset: a pio reset updates rip, and when the instruction > completes we skip the emulated instruction, pointing rip somewhere completely > unrelated. > > Fix by updating rip when we see decode the instruction, not after emulation. > > Signed-off-by: Avi Kivity <avi@xxxxxxxxxxxx> > > Which makes sense, e.g. on reset via OUT 92h or CF9h, Qemu doesn't announce > to KVM that it's doing a reset, it just stuffs register state and starts > running again. > > So, fixing this for IN is trivial. One idea for fixing OUT would be to > not skip the instruction in the complete_userspace_io() callback if the > register state implies a reset and the OUT port is a known reset port. A better idea would be to skip the %rip update if it's already dirty. I think that would work even for things like INIT-SIPI-SIPI reset as kvm_apic_accept_events() will mark %rip as dirty before the callback is invoked. I'll cook up a patch and see what happens... > Fun side topic: since OUTSB still increments %rip post-emulate, you can > reproduce the bug fixed by Avi by coercing your guest kernel into issuing > an I/O reset via an OUTSB.