On 29/04/19 16:04, Sean Christopherson wrote: > KVM's recent bug fix to update %rip after emulating I/O broke userspace > that relied on the previous behavior of incrementing %rip prior to > exiting to userspace. When running a Windows XP guest on AMD hardware, > Qemu may patch "OUT 0x7E" instructions in reaction to the OUT itself. > Because KVM's old behavior was to increment %rip before exiting to > userspace to handle the I/O, Qemu manually adjusted %rip to account for > the OUT instruction. > > Arguably this is a userspace bug as KVM requires userspace to re-enter > the kernel to complete instruction emulation before taking any other > actions. That being said, this is a bit of a grey area and breaking > userspace that has worked for many years is bad. > > Pre-increment %rip on OUT to port 0x7e before exiting to userspace to > hack around the issue. > > Fixes: 45def77ebf79e ("KVM: x86: update %rip after emulating IO") > Reported-by: Simon Becherer <simon@xxxxxxxxxxx> > Reported-and-tested-by: Iakov Karpov <srid@xxxxxxxxx> > Reported-by: Gabriele Balducci <balducci@xxxxxxxx> > Reported-by: Antti Antinoja <reader@xxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Cc: Takashi Iwai <tiwai@xxxxxxxx> > Cc: Jiri Slaby <jslaby@xxxxxxxx> > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > --- > > v2: Add a quirk so that userspace can disable the hack. [Paolo] > > arch/x86/include/uapi/asm/kvm.h | 1 + > arch/x86/kvm/x86.c | 21 +++++++++++++++++++-- > 2 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h > index dabfcf7c3941..7a0e64ccd6ff 100644 > --- a/arch/x86/include/uapi/asm/kvm.h > +++ b/arch/x86/include/uapi/asm/kvm.h > @@ -381,6 +381,7 @@ struct kvm_sync_regs { > #define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0) > #define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1) > #define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2) > +#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3) > > #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 > #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 9482cb36b92a..57621313dead 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -6539,6 +6539,12 @@ int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, > } > EXPORT_SYMBOL_GPL(kvm_emulate_instruction_from_buffer); > > +static int complete_fast_pio_out_port_0x7e(struct kvm_vcpu *vcpu) > +{ > + vcpu->arch.pio.count = 0; > + return 1; > +} > + > static int complete_fast_pio_out(struct kvm_vcpu *vcpu) > { > vcpu->arch.pio.count = 0; > @@ -6555,12 +6561,23 @@ static int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, > unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX); > int ret = emulator_pio_out_emulated(&vcpu->arch.emulate_ctxt, > size, port, &val, 1); > + if (ret) > + return ret; > > - if (!ret) { > + /* > + * Workaround userspace that relies on old KVM behavior of %rip being > + * incremented prior to exiting to userspace to handle "OUT 0x7e". > + */ > + if (port == 0x7e && > + kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_OUT_7E_INC_RIP)) { > + vcpu->arch.complete_userspace_io = > + complete_fast_pio_out_port_0x7e; > + kvm_skip_emulated_instruction(vcpu); > + } else { > vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu); > vcpu->arch.complete_userspace_io = complete_fast_pio_out; > } > - return ret; > + return 0; > } > > static int complete_fast_pio_in(struct kvm_vcpu *vcpu) > Queued, thanks (it's identical to mine except for the comment). Paolo