On Wed, Jul 24, 2024, Tao Su wrote: > When emulating POPA and exiting to userspace for MMIO, reset modified RSP > as emulation context may not be reset. POPA may generate more multiple > reads, i.e. multiple POPs from the stack, but if stack points to MMIO, > KVM needs to emulate multiple MMIO reads. > > When one MMIO done, POPA may be re-emulated with EMULTYPE_NO_DECODE set, > i.e. ctxt will not be reset, but RSP is modified by previous emulation of > current POPA instruction, which eventually leads to emulation error. > > The commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if > emulator detects exception") provides a detailed analysis of how KVM > emulates multiple MMIO reads, and its correctness can be verified in the > POPA instruction with this patch. I don't see how this can work. If POPA is reading from MMIO, it will need to do 8 distinct emulated MMIO accesses. Unwinding to the original RSP will allow the first MMIO (store to EDI) to succeed, but then the second MMIO (store to ESI) will exit back to userspace. And the second restart will load EDI with the result of the MMIO, not ESI. It will also re-trigger the second MMIO indefinitely. To make this work, KVM would need to allow precisely resuming execution where it left off. We can't use MMIO fragments, because unlike MMIO accesses that split pages, each memory load is an individual access. I don't see any reason to try to make this work. It's a ridiculously convoluted scenario that, AFAIK, has no real world application. > Signed-off-by: Tao Su <tao1.su@xxxxxxxxxxxxxxx> > --- > For testing, we can add POPA to the emulator case in kvm-unit-test. > --- > arch/x86/kvm/emulate.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index e72aed25d721..3746fef6ca60 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -1999,6 +1999,7 @@ static int em_pushf(struct x86_emulate_ctxt *ctxt) > > static int em_popa(struct x86_emulate_ctxt *ctxt) > { > + unsigned long old_esp = reg_read(ctxt, VCPU_REGS_RSP); > int rc = X86EMUL_CONTINUE; > int reg = VCPU_REGS_RDI; > u32 val = 0; > @@ -2010,8 +2011,11 @@ static int em_popa(struct x86_emulate_ctxt *ctxt) > } > > rc = emulate_pop(ctxt, &val, ctxt->op_bytes); > - if (rc != X86EMUL_CONTINUE) > + if (rc != X86EMUL_CONTINUE) { > + assign_register(reg_rmw(ctxt, VCPU_REGS_RSP), > + old_esp, ctxt->op_bytes); > break; > + } > assign_register(reg_rmw(ctxt, reg), val, ctxt->op_bytes); > --reg; > } > > base-commit: 786c8248dbd33a5a7a07f7c6e55a7bfc68d2ca48 > -- > 2.34.1 >