On Wed, Oct 30, 2024 at 12:00:21PM -0700, Rick Edgecombe wrote: > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > Add tdh_phymem_page_reclaim() to enable KVM to call > TDH.PHYMEM.PAGE.RECLAIM to reclaim the page for use by the host kernel. > This effectively resets its state in the TDX module's page tracking > (PAMT), if the page is available to be reclaimed. This will be used by KVM > to reclaim the various types of pages owned by the TDX module. It will > have a small wrapper in KVM that retries in the case of a relevant error > code. Don't implement this wrapper in arch/x86 because KVM's solution > around retrying SEAMCALLs will be better located in a single place. With the current KVM code, it looks that KVM may not need the wrapper to retry tdh_phymem_page_reclaim(). The logic of SEAMCALL TDH_PHYMEM_PAGE_RECLAIM is like this: SEAMCALL TDH_PHYMEM_PAGE_RECLAIM: 1.pamt_walk case (a):if to reclaim TDR: get shared lock of 1gb and 2mb pamt entries of TDR page, get exclusive lock of 4k pamt entry of TDR page. case (b):if to reclaim non-TDR & non-TD pages, get shared lock of 1gb and 2mb pamt entries of the page to reclaim, get exclusive lock of 4k pamt entry of the page to reclaim. case (c):if to reclaim TD pages, get exclusive lock of 1gb or 2mb or 4k pamt entry of the page to reclaim, depending on the page size of page to reclaim, get shared lock of pamt entries above the page size. 2.check the exclusively locked pamt entry of page to reclaim (e.g. page type, alignment) 3:case (a):if to reclaim TDR, map and check TDR page case (b)(c):if to reclaim non-TDR pages or TD pages, get shared lock of 4k pamt entry of TDR page, map, check of TDR page, atomically update TDR child cnt. 4.set page type to NDA to the exclusively locked pamt entry of the page to reclaim. In summary, ------------------------------------------------------------------------------ page to reclaim | locks --------------------|--------------------------------------------------------- TDR | exclusive lock of 4k pamt entry of TDR page --------------------|--------------------------------------------------------- non-TDR and non-TD | shared lock of 4k pamt entry of TDR page | exclusive lock of 4k pamt entry of page to reclaim --------------------|--------------------------------------------------------- TD page | shared lock of 4k pamt entry of TDR page | exclusive lock of pamt entry of size of page to reclaim ------------------------------------------------------------------------------ When TD is tearing down, - TD pages are removed and freed when hkid is assigned, so tdh_phymem_page_reclaim() will not be called for them. - after vt_vm_destroy() releasing the hkid, kvm_arch_destroy_vm() calls kvm_destroy_vcpus(), kvm_mmu_uninit_tdp_mmu() and tdx_vm_free() to reclaim TDCX/TDVPR/EPT/TDR pages sequentially in a single thread. So, there should be no contentions expected for current KVM to call tdh_phymem_page_reclaim(). > +u64 tdh_phymem_page_reclaim(u64 page, u64 *rcx, u64 *rdx, u64 *r8) > +{ > + struct tdx_module_args args = { > + .rcx = page, > + }; > + u64 ret; > + > + ret = seamcall_ret(TDH_PHYMEM_PAGE_RECLAIM, &args); > + > + /* > + * Additional error information: > + * > + * - RCX: page type > + * - RDX: owner > + * - R8: page size (4K, 2M or 1G) > + */ > + *rcx = args.rcx; > + *rdx = args.rdx; > + *r8 = args.r8; > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(tdh_phymem_page_reclaim);