Re: [RFC PATCH 9/9] KVM: x86/mmu: Track SPTE accessed info across mmu_notifier PROT changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 05, 2024 at 03:59:11PM +0800, Yuan Yao wrote:
> On Thu, Aug 01, 2024 at 11:34:53AM -0700, Sean Christopherson wrote:
> > Preserve Accessed information when zapping SPTEs in response to an
> > mmu_notifier protection change, e.g. if KVM is zapping SPTEs because
> > NUMA balancing kicked in.  KVM is not required to fully unmap the SPTE,
> > and the core VMA information isn't changing, i.e. the information is
> > still fresh and useful.
> >
> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 31 +++++++++++++++++++++++++------
> >  1 file changed, 25 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index ac3200ce00f9..780f35a22c05 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -838,7 +838,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> >   * operation can cause a soft lockup.
> >   */
> >  static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
> > -			      gfn_t start, gfn_t end, bool can_yield, bool flush)
> > +			      gfn_t start, gfn_t end, bool can_yield,
> > +			      bool keep_accessed_bit, bool flush)
> >  {
> >  	struct tdp_iter iter;
> >
> > @@ -849,17 +850,29 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
> >  	rcu_read_lock();
> >
> >  	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
> > +		u64 new_spte = SHADOW_NONPRESENT_VALUE;
> > +
> >  		if (can_yield &&
> >  		    tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
> >  			flush = false;
> >  			continue;
> >  		}
> >
> > +		/*
> > +		 * Note, this will fail to clear non-present, accessed SPTEs,
> > +		 * but that isn't a functional problem, it can only result in
> > +		 * a _potential_ false positive  in the unlikely scenario that
> > +		 * the primary MMU zaps an hva, reinstalls a new hva, and ages
> > +		 * the new hva, all before KVM accesses the hva.
> > +		 */
> >  		if (!is_shadow_present_pte(iter.old_spte) ||
> >  		    !is_last_spte(iter.old_spte, iter.level))
> >  			continue;
> >
> > -		tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
> > +		if (keep_accessed_bit)
> > +			new_spte |= iter.old_spte & shadow_accessed_mask;
> > +
> > +		tdp_mmu_iter_set_spte(kvm, &iter, new_spte);
> >
> >  		/*
> >  		 * Zappings SPTEs in invalid roots doesn't require a TLB flush,
> > @@ -889,7 +902,7 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush)
> >
> >  	lockdep_assert_held_write(&kvm->mmu_lock);
> >  	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, -1)
> > -		flush = tdp_mmu_zap_leafs(kvm, root, start, end, true, flush);
> > +		flush = tdp_mmu_zap_leafs(kvm, root, start, end, true, false, flush);
> >
> >  	return flush;
> >  }
> > @@ -1180,11 +1193,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >  bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
> >  				 bool flush)
> >  {
> > +	bool keep_a_bit = range->arg.event == MMU_NOTIFY_PROTECTION_VMA ||
> > +			  range->arg.event == MMU_NOTIFY_PROTECTION_PAGE;
> >  	struct kvm_mmu_page *root;
> >
> >  	__for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, false)
> >  		flush = tdp_mmu_zap_leafs(kvm, root, range->start, range->end,
> > -					  range->may_block, flush);
> > +					  range->may_block, keep_a_bit, flush);
> >
> >  	return flush;
> >  }
> > @@ -1201,7 +1216,11 @@ static void kvm_tdp_mmu_age_spte(struct tdp_iter *iter)
> >  {
> >  	u64 new_spte;
> >
> > -	if (spte_ad_enabled(iter->old_spte)) {
> > +	if (spte_ad_enabled(iter->old_spte) ||
> > +	    !is_shadow_present_pte(iter->old_spte)) {
> > +		KVM_MMU_WARN_ON(!is_shadow_present_pte(iter->old_spte) &&
> > +				iter->old_spte != (SHADOW_NONPRESENT_VALUE | shadow_accessed_mask));
>
> Is that possible some sptes are zapped by
> kvm_tdp_mmu_zap_leafs(keep_accessed_bit = false) i.e. from kvm_post_set_cr0(),
> then handled by __kvm_tdp_mmu_age_gfn_range() for aging before
> accessed by guest again ?
> In this scenario the spte is non-present w/o A bit set.
>
> > +
> >  		iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep,
> >  							 iter->old_spte,
> >  							 shadow_accessed_mask,
> > @@ -1235,7 +1254,7 @@ static bool __kvm_tdp_mmu_age_gfn_range(struct kvm *kvm,
> >  	for_each_valid_tdp_mmu_root(kvm, root, range->slot->as_id) {
> >  		rcu_read_lock();
> >
> > -		tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) {
> > +		tdp_root_for_each_pte(iter, root, range->start, range->end) {
>
> This also clears the A bit of non-leaf entries for aging, I remember
> KVM doesn't care them before, could you please explain the reason of
> this ?

The new __kvm_tdp_mmu_age_gfn_range() covers aging and testing, so here
it allows testing on non-present sptes, becasue A bit is preserved there.

I worried before that the access state is updated by
handle_changed_spte() in case of zapping, preserve A bit
gives the inaccurate access state to in future .test_young()
if no one access the zapped guest again. But this should be
addressed by patch 8 and 81 in the 'massive "follow pfn"
rework' patch set mentioned in the cover letter.

>
> >  			if (!is_accessed_spte(iter.old_spte))
> >  				continue;
> >
> > --
> > 2.46.0.rc1.232.g9752f9e123-goog
> >
> >
>




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux