On Tue, Oct 24, 2023 at 08:49:24AM -0700, Sean Christopherson wrote: > On Tue, Oct 24, 2023, Xu Yilun wrote: > > On Wed, Oct 18, 2023 at 01:46:23PM -0700, Sean Christopherson wrote: > > > @@ -126,7 +124,19 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) > > > list_first_entry(&vcpu->async_pf.done, > > > typeof(*work), link); > > > list_del(&work->link); > > > + > > > + spin_unlock(&vcpu->async_pf.lock); > > > + > > > + /* > > > + * The async #PF is "done", but KVM must wait for the work item > > > + * itself, i.e. async_pf_execute(), to run to completion. If > > > + * KVM is a module, KVM must ensure *no* code owned by the KVM > > > + * (the module) can be run after the last call to module_put(), > > > + * i.e. after the last reference to the last vCPU's file is put. > > > + */ > > > + flush_work(&work->work); > > > > I see the flush_work() is inside the check: > > > > while (!list_empty(&vcpu->async_pf.done)) > > > > Is it possible all async_pf are already completed but the work item, > > i.e. async_pf_execute, is not completed before this check? That the > > work is scheduled out after kvm_arch_async_page_present_queued() and > > all APF_READY requests have been handled. In this case the work > > synchronization will be skipped... > > Good gravy. Yes, I assumed KVM wouldn't be so crazy to delete the work before it > completed, but I obviously didn't see this comment in async_pf_execute(): > > /* > * apf may be freed by kvm_check_async_pf_completion() after > * this point > */ > > The most straightforward fix I see is to also flush the work in > kvm_check_async_pf_completion(), and then delete the comment. The downside is > that there's a small chance a vCPU could be delayed waiting for the work to > complete, but that's a very, very small chance, and likely a very small delay. > kvm_arch_async_page_present_queued() unconditionaly makes a new request, i.e. will > effectively delay entering the guest, so the remaining work is really just: > > trace_kvm_async_pf_completed(addr, cr2_or_gpa); > > __kvm_vcpu_wake_up(vcpu); > > mmput(mm); > > Since mmput() can't drop the last reference to the page tables if the vCPU is > still alive. OK, seems the impact is minor. I'm good to it. Thanks, Yilun > > I think I'll spin off the async #PF fix to a separate series. There's are other > tangetially related cleanups that can be done, e.g. there's no reason to pin the > page tables while work is queued, async_pf_execute() can do mmget_not_zero() and > then bail if the process is dying. Then there's no need to do mmput() when > canceling work.