On Wed, Jul 24, 2024 at 08:33:48AM -0700, Sean Christopherson wrote: > On Wed, Jul 24, 2024, Tao Su wrote: > > When emulating POPA and exiting to userspace for MMIO, reset modified RSP > > as emulation context may not be reset. POPA may generate more multiple > > reads, i.e. multiple POPs from the stack, but if stack points to MMIO, > > KVM needs to emulate multiple MMIO reads. > > > > When one MMIO done, POPA may be re-emulated with EMULTYPE_NO_DECODE set, > > i.e. ctxt will not be reset, but RSP is modified by previous emulation of > > current POPA instruction, which eventually leads to emulation error. > > > > The commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if > > emulator detects exception") provides a detailed analysis of how KVM > > emulates multiple MMIO reads, and its correctness can be verified in the > > POPA instruction with this patch. > > I don't see how this can work. If POPA is reading from MMIO, it will need to > do 8 distinct emulated MMIO accesses. Unwinding to the original RSP will allow > the first MMIO (store to EDI) to succeed, but then the second MMIO (store to ESI) > will exit back to userspace. And the second restart will load EDI with the > result of the MMIO, not ESI. It will also re-trigger the second MMIO indefinitely. > This can work like commit 0dc902267cb3 description. Since there is a buffer (ctxt->mem_read), KVM can safely restart the instruction from the beginning. After the first MMIO (store to EDI) done, vcpu->mmio_read_completed is set and POPA is re-emulated with EMULTYPE_NO_DECODE. When re-emulating, KVM will try first MMIO (store to EDI) _again_, but KVM will not exit to userspace in emulator_read_write() because vcpu->mmio_read_completed is set and then adjust mc->end in read_emulated(). For the second MMIO (store to ESI), it will exit to userspace and re-emulate. When re-emulating, KVM can directly read EDI from the buffer (mc->data) and try second MMIO (store to ESI) _again_, but KVM will not exit to userspace in emulator_read_write() because vcpu->mmio_read_completed is set and then adjust mc->end in read_emulated(). For the third MMIO (store to EBP) ... > To make this work, KVM would need to allow precisely resuming execution where > it left off. We can't use MMIO fragments, because unlike MMIO accesses that > split pages, each memory load is an individual access. > Since all MMIO reads are saved to the buffer (mc->data) and the index (mc->end) is adjusted by every re-emulation, KVM already supports multiple MMIO reads. > I don't see any reason to try to make this work. It's a ridiculously convoluted > scenario that, AFAIK, has no real world application. > I agree POPA+MMIO is rare but supporting it to be same as hardware is cheap. We can also use POPA to validate the multiple MMIO reads which has a complex flow ( actually, that's why I tried POPA in kvm-unit-test and then find this issue). > > Signed-off-by: Tao Su <tao1.su@xxxxxxxxxxxxxxx> > > --- > > For testing, we can add POPA to the emulator case in kvm-unit-test. > > --- > > arch/x86/kvm/emulate.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > > index e72aed25d721..3746fef6ca60 100644 > > --- a/arch/x86/kvm/emulate.c > > +++ b/arch/x86/kvm/emulate.c > > @@ -1999,6 +1999,7 @@ static int em_pushf(struct x86_emulate_ctxt *ctxt) > > > > static int em_popa(struct x86_emulate_ctxt *ctxt) > > { > > + unsigned long old_esp = reg_read(ctxt, VCPU_REGS_RSP); > > int rc = X86EMUL_CONTINUE; > > int reg = VCPU_REGS_RDI; > > u32 val = 0; > > @@ -2010,8 +2011,11 @@ static int em_popa(struct x86_emulate_ctxt *ctxt) > > } > > > > rc = emulate_pop(ctxt, &val, ctxt->op_bytes); > > - if (rc != X86EMUL_CONTINUE) > > + if (rc != X86EMUL_CONTINUE) { > > + assign_register(reg_rmw(ctxt, VCPU_REGS_RSP), > > + old_esp, ctxt->op_bytes); > > break; > > + } > > assign_register(reg_rmw(ctxt, reg), val, ctxt->op_bytes); > > --reg; > > } > > > > base-commit: 786c8248dbd33a5a7a07f7c6e55a7bfc68d2ca48 > > -- > > 2.34.1 > >