On Fri, Jan 10, 2025 at 05:24:48PM -0800, Sean Christopherson wrote: >Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to userspace >that KVM_RUN needs to be re-executed prior to save/restore in order to >complete the instruction/operation that triggered the userspace exit. > >KVM's current approach of adding notes in the Documentation is beyond >brittle, e.g. there is at least one known case where a KVM developer added >a new userspace exit type, and then that same developer forgot to handle >completion when adding userspace support. This answers one question I had: https://lore.kernel.org/kvm/Z1bmUCEdoZ87wIMn@xxxxxxxxx/ So, it is the VMM's (i.e., QEMU's) responsibility to re-execute KVM_RUN in this case. Btw, can this flag be used to address the issue [*] with steal time accounting? We can set the new flag for each vCPU in the PM notifier and we need to change the re-execution to handle steal time accounting (not just IO completion). [*]: https://lore.kernel.org/kvm/Z36XJl1OAahVkxhl@xxxxxxxxxx/ one nit below, >--- a/arch/x86/include/uapi/asm/kvm.h >+++ b/arch/x86/include/uapi/asm/kvm.h >@@ -104,9 +104,10 @@ struct kvm_ioapic_state { > #define KVM_IRQCHIP_IOAPIC 2 > #define KVM_NR_IRQCHIPS 3 > >-#define KVM_RUN_X86_SMM (1 << 0) >-#define KVM_RUN_X86_BUS_LOCK (1 << 1) >-#define KVM_RUN_X86_GUEST_MODE (1 << 2) >+#define KVM_RUN_X86_SMM (1 << 0) >+#define KVM_RUN_X86_BUS_LOCK (1 << 1) >+#define KVM_RUN_X86_GUEST_MODE (1 << 2) >+#define KVM_RUN_X86_NEEDS_COMPLETION (1 << 2) This X86_NEEDS_COMPLETION should be dropped. It is never used.