Re: [PATCH 08/16] KVM: x86/mmu: Bug the VM if kvm_zap_gfn_range() is called for TDX

Isaku Yamahata <isaku.yamahata@xxxxxxxxx> · Wed, 15 May 2024 17:15:30 -0700

On Thu, May 16, 2024 at 10:17:50AM +1200,
"Huang, Kai" <kai.huang@xxxxxxxxx> wrote:

> On 16/05/2024 4:22 am, Isaku Yamahata wrote:
> > On Wed, May 15, 2024 at 08:34:37AM -0700,
> > Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index d5cf5b15a10e..808805b3478d 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -6528,8 +6528,17 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> > > >   	flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
> > > > -	if (tdp_mmu_enabled)
> > > > +	if (tdp_mmu_enabled) {
> > > > +		/*
> > > > +		 * kvm_zap_gfn_range() is used when MTRR or PAT memory
> > > > +		 * type was changed.  TDX can't handle zapping the private
> > > > +		 * mapping, but it's ok because KVM doesn't support either of
> > > > +		 * those features for TDX. In case a new caller appears, BUG
> > > > +		 * the VM if it's called for solutions with private aliases.
> > > > +		 */
> > > > +		KVM_BUG_ON(kvm_gfn_shared_mask(kvm), kvm);
> > > 
> > > Please stop using kvm_gfn_shared_mask() as a proxy for "is this TDX".  Using a
> > > generic name quite obviously doesn't prevent TDX details for bleeding into common
> > > code, and dancing around things just makes it all unnecessarily confusing.
> > > 
> > > If we can't avoid bleeding TDX details into common code, my vote is to bite the
> > > bullet and simply check vm_type.
> > 
> > TDX has several aspects related to the TDP MMU.
> > 1) Based on the faulting GPA, determine which KVM page table to walk.
> >     (private-vs-shared)
> > 2) Need to call TDX SEAMCALL to operate on Secure-EPT instead of direct memory
> >     load/store.  TDP MMU needs hooks for it.
> > 3) The tables must be zapped from the leaf. not the root or the middle.
> > 
> > For 1) and 2), what about something like this?  TDX backend code will set
> > kvm->arch.has_mirrored_pt = true; I think we will use kvm_gfn_shared_mask() only
> > for address conversion (shared<->private).
> > 
> > For 1), maybe we can add struct kvm_page_fault.walk_mirrored_pt
> >          (or whatever preferable name)?
> > 
> > For 3), flag of memslot handles it.
> > 
> > ---
> >   arch/x86/include/asm/kvm_host.h | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index aabf1648a56a..218b575d24bd 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1289,6 +1289,7 @@ struct kvm_arch {
> >   	u8 vm_type;
> >   	bool has_private_mem;
> >   	bool has_protected_state;
> > +	bool has_mirrored_pt;
> >   	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> >   	struct list_head active_mmu_pages;
> >   	struct list_head zapped_obsolete_pages;
> > @@ -2171,8 +2172,10 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
> >   #ifdef CONFIG_KVM_PRIVATE_MEM
> >   #define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
> > +#define kvm_arch_has_mirrored_pt(kvm) ((kvm)->arch.has_mirrored_pt)
> >   #else
> >   #define kvm_arch_has_private_mem(kvm) false
> > +#define kvm_arch_has_mirrored_pt(kvm) false
> >   #endif
> >   static inline u16 kvm_read_ldt(void)
> 
> I think this 'has_mirrored_pt' (or a better name) is better, because it
> clearly conveys it is for the "page table", but not the actual page that any
> page table entry maps to.
> 
> AFAICT we need to split the concept of "private page table itself" and the
> "memory type of the actual GFN".
> 
> E.g., both SEV-SNP and TDX has concept of "private memory" (obviously), but
> I was told only TDX uses a dedicated private page table which isn't directly
> accessible for KVV.  SEV-SNP on the other hand just uses normal page table +
> additional HW managed table to make sure the security.

kvm_mmu_page_role.is_private is not good name now. Probably is_mirrored_pt or
need_callback or whatever makes sense.

> In other words, I think we should decide whether to invoke TDP MMU callback
> for private mapping (the page table itself may just be normal one) depending
> on the fault->is_private, but not whether the page table is private:
> 
> 	if (fault->is_private && kvm_x86_ops->set_private_spte)
> 		kvm_x86_set_private_spte(...);
> 	else
> 		tdp_mmu_set_spte_atomic(...);

This doesn't work for two reasons.

- We need to pass down struct kvm_page_fault fault deep only for this.
  We could change the code in such way.

- We don't have struct kvm_page_fault fault for zapping case.
  We could create a dummy one and pass it around.

Essentially the issue is how to pass down is_private or stash the info
somewhere or determine it somehow.  Options I think of are

- Pass around fault:
  Con: fault isn't passed down 
  Con: Create fake fault for zapping case

- Stash it in struct tdp_iter and pass around iter:
  Pro: work for zapping case
  Con: we need to change the code to pass down tdp_iter

- Pass around is_private (or mirrored_pt or whatever):
  Pro: Don't need to add member to some structure
  Con: We need to pass it around still.

- Stash it in kvm_mmu_page:
  The patch series uses kvm_mmu_page.role.
  Pro: We don't need to pass around because we know struct kvm_mmu_page
  Con: Need to twist root page allocation

- Use gfn. kvm_is_private_gfn(kvm, gfn):
  Con: The use of gfn is confusing.  It's too TDX specific.

> And the 'has_mirrored_pt' should be only used to select the root of the page
> table that we want to operate on.

We can add one more bool to struct kvm_page_fault.follow_mirrored_pt or
something to represent it.  We can initialize it in __kvm_mmu_do_page_fault().

.follow_mirrored_pt = kvm->arch.has_mirrored_pt && kvm_is_private_gpa(gpa);

> This also gives a chance that if there's anything special needs to be done
> for page allocated for the "non-leaf" middle page table for SEV-SNP, it can
> just fit.

Can you please elaborate on this?
-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>