Re: [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 9 Aug 2021 19:14:51 +0000

On Mon, Aug 09, 2021, Maxim Levitsky wrote:
> On Tue, 2021-08-03 at 10:44 +0200, Paolo Bonzini wrote:
> > Reviewing this patch and the next one together.
> > 
> > On 02/08/21 20:33, Maxim Levitsky wrote:
> > > +static int avic_alloc_access_page(struct kvm *kvm)
> > >  {
> > >  	void __user *ret;
> > >  	int r = 0;
> > >  
> > >  	mutex_lock(&kvm->slots_lock);
> > > +
> > > +	if (kvm->arch.apic_access_memslot_enabled)
> > >  		goto out;
> > 
> > This variable is overloaded between "is access enabled" and "is the 
> > memslot allocated".  I think you should check 
> > kvm->arch.apicv_inhibit_reasons instead in kvm_faultin_pfn.
> > 
> > 
> > > +	if (!activate)
> > > +		kvm_zap_gfn_range(kvm, gpa_to_gfn(APIC_DEFAULT_PHYS_BASE),
> > > +				  gpa_to_gfn(APIC_DEFAULT_PHYS_BASE + PAGE_SIZE));
> > > +
> > 
> > Off by one, the last argument of kvm_zap_gfn_range is inclusive:
> 
> Actually is it? 

Nope.  The actual implementation is exclusive for both legacy and TDP MMU.  And
as you covered below, the fixed and variable MTRR helpers provide exclusive
start+end, so there's no functional bug.  The "0 - ~0" use case is irrevelant
because there can't be physical memory at -4096.

The ~0ull case can be fixed by adding a helper to get the max GFN possible, e.g.
steal this code from kvm_tdp_mmu_put_root()

	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);

and maybe add a comment saying it intentionally ignores guest.MAXPHYADDR (from
CPUID) so that the helper can be used even when CPUID is being modified.

> There are 3 uses of this function.
> Two of them (kvm_post_set_cr0 and one case in update_mtrr) use 0,~0ULL which is indeed inclusive,
> but for variable mtrrs I see that in var_mtrr_range this code:
> 
> *end = (*start | ~mask) + 1;
> 
> and the *end is passed to kvm_zap_gfn_range.
> 
> 
> Another thing I noticed that I added calls to kvm_inc_notifier_count/kvm_dec_notifier_count
> in the kvm_zap_gfn_range but these do seem to have non inclusive ends, thus 
> I need to fix them sadly if this is the case.
> This depends on mmu_notifier_ops and it is not documented well.
> 
> However at least mmu_notifier_retry_hva, does assume a non inclusive range since it checks
> 
> 
> hva >= kvm->mmu_notifier_range_start &&
> 	    hva < kvm->mmu_notifier_range_end
> 
> 
> Also looking at the algorithm of the kvm_zap_gfn_range.
> Suppose that gfn_start == gfn_end and we have a memslot with one page at gfn_start
> 
> Then:
> 
> 
> start = max(gfn_start, memslot->base_gfn); // start = memslot->base_gfn
> end = min(gfn_end, memslot->base_gfn + memslot->npages); // end = memslot->base_gfn
> 
> if (start >= end)
> 	continue;
> 
> In this case it seems that it will do nothing. So I suspect that kvm_zap_gfn_range
> actually needs non inclusive range but due to the facts that it was used much
> it didn't cause trouble.
>
> Another thing I found in kvm_zap_gfn_range:
> 
> kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
> 
> But kvm_flush_remote_tlbs_with_address expects (struct kvm *kvm, u64 start_gfn, u64 pages)

Heh, surpise, surprise, a rare path with no architecturally visible effects is
busted :-)

> kvm_flush_remote_tlbs_with_address is also for some reason called twice with
> the same parameters.

It's called twice in the current code because mmu_lock is dropped between handling
the current MMU and the legacy mmu.

> Could you help with that? Am I missing something?