On Mon, Jan 13, 2025, Chao Gao wrote: > On Fri, Jan 10, 2025 at 05:24:48PM -0800, Sean Christopherson wrote: > >Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to userspace > >that KVM_RUN needs to be re-executed prior to save/restore in order to > >complete the instruction/operation that triggered the userspace exit. > > > >KVM's current approach of adding notes in the Documentation is beyond > >brittle, e.g. there is at least one known case where a KVM developer added > >a new userspace exit type, and then that same developer forgot to handle > >completion when adding userspace support. > > This answers one question I had: > https://lore.kernel.org/kvm/Z1bmUCEdoZ87wIMn@xxxxxxxxx/ > > So, it is the VMM's (i.e., QEMU's) responsibility to re-execute KVM_RUN in this > case. Yep. > Btw, can this flag be used to address the issue [*] with steal time accounting? > We can set the new flag for each vCPU in the PM notifier and we need to change > the re-execution to handle steal time accounting (not just IO completion). > > [*]: https://lore.kernel.org/kvm/Z36XJl1OAahVkxhl@xxxxxxxxxx/ Uh, hmm. Partially? And not without creating new, potentially worse problems. I like the idea, but (a) there's no guarantee a vCPU would be "in" KVM_RUN at the time of suspend, and (b) KVM would need to take vcpu->mutex in the PM notifier in order to avoid clobbering the current completion callback, which is definitely a net negative (hello, deadlocks). E.g. if a vCPU task is in userspace processing emulated MMIO at the time of suspend+resume, KVM's completion callback will be non-zero and must be preserved. And if a vCPU task is in userspace processing an exit that _doesn't_ require completion, setting KVM_RUN_NEEDS_COMPLETION would likely be missed by userspace, e.g. if userspace checks the flag only after regaining control from KVM_RUN. In general, I think setting KVM_RUN_NEEDS_COMPLETION outside of KVM_RUN would add too much complexity. > one nit below, > > >--- a/arch/x86/include/uapi/asm/kvm.h > >+++ b/arch/x86/include/uapi/asm/kvm.h > >@@ -104,9 +104,10 @@ struct kvm_ioapic_state { > > #define KVM_IRQCHIP_IOAPIC 2 > > #define KVM_NR_IRQCHIPS 3 > > > >-#define KVM_RUN_X86_SMM (1 << 0) > >-#define KVM_RUN_X86_BUS_LOCK (1 << 1) > >-#define KVM_RUN_X86_GUEST_MODE (1 << 2) > >+#define KVM_RUN_X86_SMM (1 << 0) > >+#define KVM_RUN_X86_BUS_LOCK (1 << 1) > >+#define KVM_RUN_X86_GUEST_MODE (1 << 2) > >+#define KVM_RUN_X86_NEEDS_COMPLETION (1 << 2) > > This X86_NEEDS_COMPLETION should be dropped. It is never used. Gah, thanks!