On Thu, Aug 11, 2022, Paolo Bonzini wrote: > kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore > it cannot sleep. Writing to guest memory is therefore forbidden, but it > can happen on AMD processors if kvm_check_nested_events() causes a vmexit. > > Fortunately, all events that are caught by kvm_check_nested_events() are > also recognized by kvm_vcpu_has_events() through vendor callbacks such as > kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so > remove the call and postpone the actual processing to vcpu_block(). > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- > arch/x86/kvm/x86.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 5e9358ea112b..9226fd536783 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10639,6 +10639,17 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu) > return 1; > } > > + if (is_guest_mode(vcpu)) { > + /* > + * Evaluate nested events before exiting the halted state. > + * This allows the halt state to be recorded properly in > + * the VMCS12's activity state field (AMD does not have > + * a similar field and a vmexit always causes a spurious > + * wakeup from HLT). > + */ > + kvm_check_nested_events(vcpu); Formatting nit, I'd prefer the block comment go above the if-statement, that way we avoiding debating whether or not the technically-unnecessary braces align with kernel/KVM style, and it doesn't have to wrap as aggressively. And s/vmexit/VM-Exit while I'm nitpicking. /* * Evaluate nested events before exiting the halted state. This allows * the halt state to be recorded properly in the VMCS12's activity * state field (AMD does not have a similar field and a VM-Exit always * causes a spurious wakeup from HLT). */ if (is_guest_mode(vcpu)) kvm_check_nested_events(vcpu); Side topic, the AMD behavior is a bug report waiting to happen. I know of at least one customer failure that was root caused to a KVM bug where KVM caused a spurious wakeup. To be fair, the guest workload was being stupid (execute HLT on vCPU and then effectively unmap its code by doing kexec), but it's still an unpleasant gap :-(