[...] > > > > And why do you need a special function just for control page(s)? > > We can revise the code to have common function for reclaiming page. I interpret this as you will remove tdx_reclaim_control_page(), and have one function to reclaim _ALL_ TDX private pages. > > > > > > > How about this? > > > > > > > > > > /* > > > > > * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(), and > > > > > * TDH.MNG.KEY.FREEID() to free the HKID. > > > > > * Other threads can remove pages from TD. When the HKID is assigned, we need > > > > > * to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). > > > > > * TDH.PHYMEM.PAGE.RECLAIM() is needed when the HKID is free. Get lock to not > > > > > * present transient state of HKID. > > > > > */ > > > > > > > > Could you elaborate why it is still possible to have other thread removing > > > > pages from TD? > > > > > > > > I am probably missing something, but the thing I don't understand is why > > > > this function is triggered by MMU release? All the things done in this > > > > function don't seem to be related to MMU at all. > > > > > > The KVM releases EPT pages on MMU notifier release. kvm_mmu_zap_all() does. If > > > we follow that way, kvm_mmu_zap_all() zaps all the Secure-EPTs by > > > TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). Because > > > TDH.MEM.{SEPT, PAGE}.REMOVE() is slow, we can free HKID before kvm_mmu_zap_all() > > > to use TDH.PHYMEM.PAGE.RECLAIM(). > > > > Can you elaborate why TDH.MEM.{SEPT,PAGE}.REMOVE is slower than > > TDH.PHYMEM.PAGE.RECLAIM()? > > > > And does the difference matter in practice, i.e. did you see using the former > > having noticeable performance downgrade? > > Yes. With HKID alive, we have to assume that vcpu can run still. It means TLB > shootdown. The difference is 2 extra SEAMCALL + IPI synchronization for each > guest private page. If the guest has hundreds of GB, the difference can be > tens of minutes. > > With HKID alive, we need to assume vcpu is alive. > - TDH.MEM.PAGE.REMOVE() > - TDH.PHYMEM.PAGE_WBINVD() > - TLB shoot down > - TDH.MEM.TRACK() > - IPI to other vcpus > - wait for other vcpu to exit > > After freeing HKID > - TDH.PHYMEM.PAGE.RECLAIM() > We already flushed TLBs and memory cache. > > > > [...] > > > > Firstly, what kinda performance efficiency gain are we talking about? > > 2 extra SEAMCALL + IPI sync for each guest private page. If the guest memory > is hundreds of GB, the difference can be tens of minutes. [...] > > > > We cannot really tell whether it can be justified to use two different methods > > to tear down SEPT page because of this. > > > > Even if it's worth to do, it is an optimization, which can/should be done later > > after you have put all building blocks together. > > > > That being said, you are putting too many logic in this patch, i.e., it just > > doesn't make sense to release TDX keyID in the MMU code path in _this_ patch. > > I agree that this patch is too huge, and that we should break it into smaller > patches. IMHO it's not only breaking into smaller pieces, but also you are mixing performance optimization and essential functionalities together. Moving reclaiming TDX private KeyID to MMU notifier (in order to have a better performance when shutting down TDX guest) depends on a lot SEPT details (how to reclaim private page, TLB flush etc), which haven't yet been mentioned at all. It's hard to review code like this. I think here in this patch, we should just put reclaiming TDX keyID to the "normal" place. After you have done all SEPT (and related) patches, you can have a patch to improve the performance: KVM: TDX: Improve TDX guest shutdown latency Then you put your performance data there, i.e., "tens of minutes difference for TDX guest with hundreds of GB memory", to justify that patch.