On Fri, Apr 12, 2024, Isaku Yamahata wrote: > On Fri, Apr 12, 2024 at 09:15:29AM -0700, Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: > > > +void tdx_mmu_release_hkid(struct kvm *kvm) > > > +{ > > > + while (__tdx_mmu_release_hkid(kvm) == -EBUSY) > > > + ; > > > } > > > > As I understand, __tdx_mmu_release_hkid() returns -EBUSY > > after TDH.VP.FLUSH has been sent for every vCPU followed by > > TDH.MNG.VPFLUSHDONE, which returns TDX_FLUSHVP_NOT_DONE. > > > > Considering earlier comment that a retry of TDH.VP.FLUSH is not > > needed, why is this while() loop here that sends the > > TDH.VP.FLUSH again to all vCPUs instead of just a loop within > > __tdx_mmu_release_hkid() to _just_ resend TDH.MNG.VPFLUSHDONE? > > > > Could it be possible for a vCPU to appear during this time, thus > > be missed in one TDH.VP.FLUSH cycle, to require a new cycle of > > TDH.VP.FLUSH? > > Yes. There is a race between closing KVM vCPU fd and MMU notifier release hook. > When KVM vCPU fd is closed, vCPU context can be loaded again. But why is _loading_ a vCPU context problematic? If I'm reading the TDX module code correctly, TDX_FLUSHVP_NOT_DONE is returned when a vCPU is "associated" with a pCPU, and association only happens during TDH.VP_ENTER, TDH.MNG.RD, and TDH.MNG.WR, none of which I see in tdx_vcpu_load(). Assuming there is something problematic lurking under vcpu_load(), I would love, love, LOVE an excuse to not do vcpu_{load,put}() in kvm_unload_vcpu_mmu(), i.e. get rid of that thing entirely. I have definitely looked into kvm_unload_vcpu_mmu() on more than one occassion, but I can't remember off the top of my head why I have never yanked out the vcpu_{load,put}(). Maybe I was just scared of breaking something and didn't have a good reason to risk breakage? > The MMU notifier release hook eventually calls tdx_mmu_release_hkid(). Other > kernel thread (concretely, vhost krenel thread) can get reference count to > mmu and put it by timer, the MMU notifier release hook can be triggered > during closing vCPU fd. > > The possible alternative is to make the vCPU closing path complicated not to > load vCPU context instead f sending IPI on every retry.