Re: [PATCH v2 03/15] KVM: x86/mmu: Add a mirrored pointer to struct kvm_mmu_page

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Thu, 6 Jun 2024 18:04:17 +0200

On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe
<rick.p.edgecombe@xxxxxxxxx> wrote:
>
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> Add a mirrored pointer to struct kvm_mmu_page for the private page table and
> add helper functions to allocate/initialize/free a private page table page.
> Because KVM TDP MMU doesn't use unsync_children and write_flooding_count,
> pack them to have room for a pointer and use a union to avoid memory
> overhead.
>
> For private GPA, CPU refers to a private page table whose contents are
> encrypted. The dedicated APIs to operate on it (e.g. updating/reading its
> PTE entry) are used, and their cost is expensive.
>
> When KVM resolves the KVM page fault, it walks the page tables. To reuse
> the existing KVM MMU code and mitigate the heavy cost of directly walking
> the private page table, allocate one more page for the mirrored page table
> for the KVM MMU code to directly walk. Resolve the KVM page fault with
> the existing code, and do additional operations necessary for the private
> page table. To distinguish such cases, the existing KVM page table is
> called a shared page table (i.e., not associated with a private page
> table), and the page table with a private page table is called a mirrored
> page table. The relationship is depicted below.
>
>               KVM page fault                     |
>                      |                           |
>                      V                           |
>         -------------+----------                 |
>         |                      |                 |
>         V                      V                 |
>      shared GPA           private GPA            |
>         |                      |                 |
>         V                      V                 |
>     shared PT root      mirror PT root           |    private PT root
>         |                      |                 |           |
>         V                      V                 |           V
>      shared PT           mirror PT     --propagate-->  private/mirrored PT
>         |                      |                 |           |
>         |                      \-----------------+------\    |
>         |                                        |      |    |
>         V                                        |      V    V
>   shared guest page                              |    private guest page
>                                                  |
>                            non-encrypted memory  |    encrypted memory
>                                                  |
> PT: Page table
> Shared PT: visible to KVM, and the CPU uses it for shared mappings.
> Private/mirrored PT: the CPU uses it, but it is invisible to KVM.  TDX
>                      module updates this table to map private guest pages.
> Mirror PT: It is visible to KVM, but the CPU doesn't use it.  KVM uses it
>              to propagate PT change to the actual private PT.

Which one is the "Mirror" and which one is the "Mirrored" PT is
uncomfortably confusing.

I hate to bikeshed even more, but while I like "Mirror PT" (a lot), I
would stick with "Private" or perhaps "External" for the pages owned
by the TDX module.

> +       /*
> +        * This cache is to allocate private page table. E.g. private EPT used
> +        * by the TDX module.
> +        */
> +       struct kvm_mmu_memory_cache mmu_mirrored_spt_cache;

So this would be "mmu_external_spt_cache".

> -       unsigned int unsync_children;
> +       union {
> +               /* Those two members aren't used for TDP MMU */

s/Those/These/

> +               struct {
> +                       unsigned int unsync_children;
> +                       /*
> +                        * Number of writes since the last time traversal
> +                        * visited this page.
> +                        */
> +                       atomic_t write_flooding_count;
> +               };
> +               /*
> +                * Page table page of private PT.
> +                * Passed to TDX module, not accessed by KVM.
> +                */
> +               void *mirrored_spt;

external_spt

> +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> +{
> +       /*
> +        * mirrored_spt is allocated for TDX module to hold private EPT mappings,
> +        * TDX module will initialize the page by itself.
> +        * Therefore, KVM does not need to initialize or access mirrored_spt.
> +        * KVM only interacts with sp->spt for mirrored EPT operations.
> +        */
> +       sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache);
> +}
> +
> +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> +{
> +       /*
> +        * private_spt is allocated for TDX module to hold private EPT mappings,
> +        * TDX module will initialize the page by itself.
> +        * Therefore, KVM does not need to initialize or access private_spt.
> +        * KVM only interacts with sp->spt for mirrored EPT operations.
> +        */
> +       sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache);
> +}

Duplicate function.

Naming aside, looks good.

Paolo

On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe
<rick.p.edgecombe@xxxxxxxxx> wrote:
>
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> Add a mirrored pointer to struct kvm_mmu_page for the private page table and
> add helper functions to allocate/initialize/free a private page table page.
> Because KVM TDP MMU doesn't use unsync_children and write_flooding_count,
> pack them to have room for a pointer and use a union to avoid memory
> overhead.
>
> For private GPA, CPU refers to a private page table whose contents are
> encrypted. The dedicated APIs to operate on it (e.g. updating/reading its
> PTE entry) are used, and their cost is expensive.
>
> When KVM resolves the KVM page fault, it walks the page tables. To reuse
> the existing KVM MMU code and mitigate the heavy cost of directly walking
> the private page table, allocate one more page for the mirrored page table
> for the KVM MMU code to directly walk. Resolve the KVM page fault with
> the existing code, and do additional operations necessary for the private
> page table. To distinguish such cases, the existing KVM page table is
> called a shared page table (i.e., not associated with a private page
> table), and the page table with a private page table is called a mirrored
> page table. The relationship is depicted below.
>
>               KVM page fault                     |
>                      |                           |
>                      V                           |
>         -------------+----------                 |
>         |                      |                 |
>         V                      V                 |
>      shared GPA           private GPA            |
>         |                      |                 |
>         V                      V                 |
>     shared PT root      mirror PT root           |    private PT root
>         |                      |                 |           |
>         V                      V                 |           V
>      shared PT           mirror PT     --propagate-->  private/mirrored PT
>         |                      |                 |           |
>         |                      \-----------------+------\    |
>         |                                        |      |    |
>         V                                        |      V    V
>   shared guest page                              |    private guest page
>                                                  |
>                            non-encrypted memory  |    encrypted memory
>                                                  |
> PT: Page table
> Shared PT: visible to KVM, and the CPU uses it for shared mappings.
> Private/mirrored PT: the CPU uses it, but it is invisible to KVM.  TDX
>                      module updates this table to map private guest pages.
> Mirror PT: It is visible to KVM, but the CPU doesn't use it.  KVM uses it
>              to propagate PT change to the actual private PT.
>
> Add a helper kvm_has_mirrored_tdp() to trigger this behavior and wire it
> to the TDX vm type.
>
> Co-developed-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
> Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
> Reviewed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
> ---
> TDX MMU Prep v2:
>  - Rename private->mirror
>  - Don't trigger off of shared mask
>
> TDX MMU Prep:
> - Rename terminology, dummy PT => mirror PT. and updated the commit message
>   By Rick and Kai.
> - Added a comment on union of private_spt by Rick.
> - Don't handle the root case in kvm_mmu_alloc_private_spt(), it will not
>   be needed in future patches. (Rick)
> - Update comments (Yan)
> - Remove kvm_mmu_init_private_spt(), open code it in later patches (Yan)
>
> v19:
> - typo in the comment in kvm_mmu_alloc_private_spt()
> - drop CONFIG_KVM_MMU_PRIVATE
> ---
>  arch/x86/include/asm/kvm_host.h |  5 ++++
>  arch/x86/kvm/mmu.h              |  5 ++++
>  arch/x86/kvm/mmu/mmu.c          |  7 +++++
>  arch/x86/kvm/mmu/mmu_internal.h | 47 ++++++++++++++++++++++++++++++---
>  arch/x86/kvm/mmu/tdp_mmu.c      |  1 +
>  5 files changed, 61 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index aabf1648a56a..250899a0239b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -817,6 +817,11 @@ struct kvm_vcpu_arch {
>         struct kvm_mmu_memory_cache mmu_shadow_page_cache;
>         struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
>         struct kvm_mmu_memory_cache mmu_page_header_cache;
> +       /*
> +        * This cache is to allocate private page table. E.g. private EPT used
> +        * by the TDX module.
> +        */
> +       struct kvm_mmu_memory_cache mmu_mirrored_spt_cache;
>
>         /*
>          * QEMU userspace and the guest each have their own FPU state.
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index dc80e72e4848..0c3bf89cf7db 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -318,4 +318,9 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
>                 return gpa;
>         return translate_nested_gpa(vcpu, gpa, access, exception);
>  }
> +
> +static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm)
> +{
> +       return kvm->arch.vm_type == KVM_X86_TDX_VM;
> +}
>  #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b97241945596..5070ba7c6e89 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -685,6 +685,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
>                                        1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
>         if (r)
>                 return r;
> +       if (kvm_has_mirrored_tdp(vcpu->kvm)) {
> +               r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache,
> +                                              PT64_ROOT_MAX_LEVEL);
> +               if (r)
> +                       return r;
> +       }
>         r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
>                                        PT64_ROOT_MAX_LEVEL);
>         if (r)
> @@ -704,6 +710,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
>         kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
>         kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
>         kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
> +       kvm_mmu_free_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache);
>         kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
>  }
>
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 706f0ce8784c..faef40a561f9 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -101,7 +101,22 @@ struct kvm_mmu_page {
>                 int root_count;
>                 refcount_t tdp_mmu_root_count;
>         };
> -       unsigned int unsync_children;
> +       union {
> +               /* Those two members aren't used for TDP MMU */
> +               struct {
> +                       unsigned int unsync_children;
> +                       /*
> +                        * Number of writes since the last time traversal
> +                        * visited this page.
> +                        */
> +                       atomic_t write_flooding_count;
> +               };
> +               /*
> +                * Page table page of private PT.
> +                * Passed to TDX module, not accessed by KVM.
> +                */
> +               void *mirrored_spt;
> +       };
>         union {
>                 struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
>                 tdp_ptep_t ptep;
> @@ -124,9 +139,6 @@ struct kvm_mmu_page {
>         int clear_spte_count;
>  #endif
>
> -       /* Number of writes since the last time traversal visited this page.  */
> -       atomic_t write_flooding_count;
> -
>  #ifdef CONFIG_X86_64
>         /* Used for freeing the page asynchronously if it is a TDP MMU page. */
>         struct rcu_head rcu_head;
> @@ -145,6 +157,33 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
>         return kvm_mmu_role_as_id(sp->role);
>  }
>
> +static inline void *kvm_mmu_mirrored_spt(struct kvm_mmu_page *sp)
> +{
> +       return sp->mirrored_spt;
> +}
> +
> +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> +{
> +       /*
> +        * mirrored_spt is allocated for TDX module to hold private EPT mappings,
> +        * TDX module will initialize the page by itself.
> +        * Therefore, KVM does not need to initialize or access mirrored_spt.
> +        * KVM only interacts with sp->spt for mirrored EPT operations.
> +        */
> +       sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache);
> +}
> +
> +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> +{
> +       /*
> +        * private_spt is allocated for TDX module to hold private EPT mappings,
> +        * TDX module will initialize the page by itself.
> +        * Therefore, KVM does not need to initialize or access private_spt.
> +        * KVM only interacts with sp->spt for mirrored EPT operations.
> +        */
> +       sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache);
> +}
> +
>  static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>  {
>         /*
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1259dd63defc..e7cd4921afe7 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -53,6 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>
>  static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
>  {
> +       free_page((unsigned long)sp->mirrored_spt);
>         free_page((unsigned long)sp->spt);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
> --
> 2.34.1
>