On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> wrote: > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > Add a mirrored pointer to struct kvm_mmu_page for the private page table and > add helper functions to allocate/initialize/free a private page table page. > Because KVM TDP MMU doesn't use unsync_children and write_flooding_count, > pack them to have room for a pointer and use a union to avoid memory > overhead. > > For private GPA, CPU refers to a private page table whose contents are > encrypted. The dedicated APIs to operate on it (e.g. updating/reading its > PTE entry) are used, and their cost is expensive. > > When KVM resolves the KVM page fault, it walks the page tables. To reuse > the existing KVM MMU code and mitigate the heavy cost of directly walking > the private page table, allocate one more page for the mirrored page table > for the KVM MMU code to directly walk. Resolve the KVM page fault with > the existing code, and do additional operations necessary for the private > page table. To distinguish such cases, the existing KVM page table is > called a shared page table (i.e., not associated with a private page > table), and the page table with a private page table is called a mirrored > page table. The relationship is depicted below. > > KVM page fault | > | | > V | > -------------+---------- | > | | | > V V | > shared GPA private GPA | > | | | > V V | > shared PT root mirror PT root | private PT root > | | | | > V V | V > shared PT mirror PT --propagate--> private/mirrored PT > | | | | > | \-----------------+------\ | > | | | | > V | V V > shared guest page | private guest page > | > non-encrypted memory | encrypted memory > | > PT: Page table > Shared PT: visible to KVM, and the CPU uses it for shared mappings. > Private/mirrored PT: the CPU uses it, but it is invisible to KVM. TDX > module updates this table to map private guest pages. > Mirror PT: It is visible to KVM, but the CPU doesn't use it. KVM uses it > to propagate PT change to the actual private PT. Which one is the "Mirror" and which one is the "Mirrored" PT is uncomfortably confusing. I hate to bikeshed even more, but while I like "Mirror PT" (a lot), I would stick with "Private" or perhaps "External" for the pages owned by the TDX module. > + /* > + * This cache is to allocate private page table. E.g. private EPT used > + * by the TDX module. > + */ > + struct kvm_mmu_memory_cache mmu_mirrored_spt_cache; So this would be "mmu_external_spt_cache". > - unsigned int unsync_children; > + union { > + /* Those two members aren't used for TDP MMU */ s/Those/These/ > + struct { > + unsigned int unsync_children; > + /* > + * Number of writes since the last time traversal > + * visited this page. > + */ > + atomic_t write_flooding_count; > + }; > + /* > + * Page table page of private PT. > + * Passed to TDX module, not accessed by KVM. > + */ > + void *mirrored_spt; external_spt > +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > +{ > + /* > + * mirrored_spt is allocated for TDX module to hold private EPT mappings, > + * TDX module will initialize the page by itself. > + * Therefore, KVM does not need to initialize or access mirrored_spt. > + * KVM only interacts with sp->spt for mirrored EPT operations. > + */ > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); > +} > + > +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > +{ > + /* > + * private_spt is allocated for TDX module to hold private EPT mappings, > + * TDX module will initialize the page by itself. > + * Therefore, KVM does not need to initialize or access private_spt. > + * KVM only interacts with sp->spt for mirrored EPT operations. > + */ > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); > +} Duplicate function. Naming aside, looks good. Paolo On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> wrote: > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > Add a mirrored pointer to struct kvm_mmu_page for the private page table and > add helper functions to allocate/initialize/free a private page table page. > Because KVM TDP MMU doesn't use unsync_children and write_flooding_count, > pack them to have room for a pointer and use a union to avoid memory > overhead. > > For private GPA, CPU refers to a private page table whose contents are > encrypted. The dedicated APIs to operate on it (e.g. updating/reading its > PTE entry) are used, and their cost is expensive. > > When KVM resolves the KVM page fault, it walks the page tables. To reuse > the existing KVM MMU code and mitigate the heavy cost of directly walking > the private page table, allocate one more page for the mirrored page table > for the KVM MMU code to directly walk. Resolve the KVM page fault with > the existing code, and do additional operations necessary for the private > page table. To distinguish such cases, the existing KVM page table is > called a shared page table (i.e., not associated with a private page > table), and the page table with a private page table is called a mirrored > page table. The relationship is depicted below. > > KVM page fault | > | | > V | > -------------+---------- | > | | | > V V | > shared GPA private GPA | > | | | > V V | > shared PT root mirror PT root | private PT root > | | | | > V V | V > shared PT mirror PT --propagate--> private/mirrored PT > | | | | > | \-----------------+------\ | > | | | | > V | V V > shared guest page | private guest page > | > non-encrypted memory | encrypted memory > | > PT: Page table > Shared PT: visible to KVM, and the CPU uses it for shared mappings. > Private/mirrored PT: the CPU uses it, but it is invisible to KVM. TDX > module updates this table to map private guest pages. > Mirror PT: It is visible to KVM, but the CPU doesn't use it. KVM uses it > to propagate PT change to the actual private PT. > > Add a helper kvm_has_mirrored_tdp() to trigger this behavior and wire it > to the TDX vm type. > > Co-developed-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > Reviewed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> > --- > TDX MMU Prep v2: > - Rename private->mirror > - Don't trigger off of shared mask > > TDX MMU Prep: > - Rename terminology, dummy PT => mirror PT. and updated the commit message > By Rick and Kai. > - Added a comment on union of private_spt by Rick. > - Don't handle the root case in kvm_mmu_alloc_private_spt(), it will not > be needed in future patches. (Rick) > - Update comments (Yan) > - Remove kvm_mmu_init_private_spt(), open code it in later patches (Yan) > > v19: > - typo in the comment in kvm_mmu_alloc_private_spt() > - drop CONFIG_KVM_MMU_PRIVATE > --- > arch/x86/include/asm/kvm_host.h | 5 ++++ > arch/x86/kvm/mmu.h | 5 ++++ > arch/x86/kvm/mmu/mmu.c | 7 +++++ > arch/x86/kvm/mmu/mmu_internal.h | 47 ++++++++++++++++++++++++++++++--- > arch/x86/kvm/mmu/tdp_mmu.c | 1 + > 5 files changed, 61 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index aabf1648a56a..250899a0239b 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -817,6 +817,11 @@ struct kvm_vcpu_arch { > struct kvm_mmu_memory_cache mmu_shadow_page_cache; > struct kvm_mmu_memory_cache mmu_shadowed_info_cache; > struct kvm_mmu_memory_cache mmu_page_header_cache; > + /* > + * This cache is to allocate private page table. E.g. private EPT used > + * by the TDX module. > + */ > + struct kvm_mmu_memory_cache mmu_mirrored_spt_cache; > > /* > * QEMU userspace and the guest each have their own FPU state. > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index dc80e72e4848..0c3bf89cf7db 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -318,4 +318,9 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu, > return gpa; > return translate_nested_gpa(vcpu, gpa, access, exception); > } > + > +static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) > +{ > + return kvm->arch.vm_type == KVM_X86_TDX_VM; > +} > #endif > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index b97241945596..5070ba7c6e89 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -685,6 +685,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) > 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); > if (r) > return r; > + if (kvm_has_mirrored_tdp(vcpu->kvm)) { > + r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache, > + PT64_ROOT_MAX_LEVEL); > + if (r) > + return r; > + } > r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, > PT64_ROOT_MAX_LEVEL); > if (r) > @@ -704,6 +710,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); > + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache); > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); > } > > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > index 706f0ce8784c..faef40a561f9 100644 > --- a/arch/x86/kvm/mmu/mmu_internal.h > +++ b/arch/x86/kvm/mmu/mmu_internal.h > @@ -101,7 +101,22 @@ struct kvm_mmu_page { > int root_count; > refcount_t tdp_mmu_root_count; > }; > - unsigned int unsync_children; > + union { > + /* Those two members aren't used for TDP MMU */ > + struct { > + unsigned int unsync_children; > + /* > + * Number of writes since the last time traversal > + * visited this page. > + */ > + atomic_t write_flooding_count; > + }; > + /* > + * Page table page of private PT. > + * Passed to TDX module, not accessed by KVM. > + */ > + void *mirrored_spt; > + }; > union { > struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ > tdp_ptep_t ptep; > @@ -124,9 +139,6 @@ struct kvm_mmu_page { > int clear_spte_count; > #endif > > - /* Number of writes since the last time traversal visited this page. */ > - atomic_t write_flooding_count; > - > #ifdef CONFIG_X86_64 > /* Used for freeing the page asynchronously if it is a TDP MMU page. */ > struct rcu_head rcu_head; > @@ -145,6 +157,33 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) > return kvm_mmu_role_as_id(sp->role); > } > > +static inline void *kvm_mmu_mirrored_spt(struct kvm_mmu_page *sp) > +{ > + return sp->mirrored_spt; > +} > + > +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > +{ > + /* > + * mirrored_spt is allocated for TDX module to hold private EPT mappings, > + * TDX module will initialize the page by itself. > + * Therefore, KVM does not need to initialize or access mirrored_spt. > + * KVM only interacts with sp->spt for mirrored EPT operations. > + */ > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); > +} > + > +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) > +{ > + /* > + * private_spt is allocated for TDX module to hold private EPT mappings, > + * TDX module will initialize the page by itself. > + * Therefore, KVM does not need to initialize or access private_spt. > + * KVM only interacts with sp->spt for mirrored EPT operations. > + */ > + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache); > +} > + > static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) > { > /* > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 1259dd63defc..e7cd4921afe7 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -53,6 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) > > static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) > { > + free_page((unsigned long)sp->mirrored_spt); > free_page((unsigned long)sp->spt); > kmem_cache_free(mmu_page_header_cache, sp); > } > -- > 2.34.1 >