On Thu, Aug 19, 2021 at 7:57 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Make a final call to direct_pte_prefetch_many() if there are "trailing" > SPTEs to prefetch, i.e. SPTEs for GFNs following the faulting GFN. The > call to direct_pte_prefetch_many() in the loop only handles the case > where there are !PRESENT SPTEs preceding a PRESENT SPTE. > > E.g. if the faulting GFN is a multiple of 8 (the prefetch size) and all > SPTEs for the following GFNs are !PRESENT, the loop will terminate with > "start = sptep+1" and not prefetch any SPTEs. > > Prefetching trailing SPTEs as intended can drastically reduce the number > of guest page faults, e.g. accessing the first byte of every 4kb page in > a 6gb chunk of virtual memory, in a VM with 8gb of preallocated memory, > the number of pf_fixed events observed in L0 drops from ~1.75M to <0.27M. > > Note, this only affects memory that is backed by 4kb pages as KVM doesn't > prefetch when installing hugepages. Shadow paging prefetching is not > affected as it does not batch the prefetches due to the need to process > the corresponding guest PTE. The TDP MMU is not affected because it > doesn't have prefetching, yet... > > Fixes: 957ed9effd80 ("KVM: MMU: prefetch ptes when intercepted guest #PF") > Cc: Sergey Senozhatsky <senozhatsky@xxxxxxxxxx> > Cc: Ben Gardon <bgardon@xxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > > Cc'd Ben as this highlights a potential gap with the TDP MMU, which lacks > prefetching of any sort. For large VMs, which are likely backed by > hugepages anyways, this is a non-issue as the benefits of holding mmu_lock > for read likely masks the cost of taking more VM-Exits. But VMs with a > small number of vCPUs won't benefit as much from parallel page faults, > e.g. there's no benefit at all if there's a single vCPU. > > arch/x86/kvm/mmu/mmu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index a272ccbddfa1..daf7df35f788 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2818,11 +2818,13 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu, > if (!start) > continue; > if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0) > - break; > + return; > start = NULL; > } else if (!start) > start = spte; > } > + if (start) > + direct_pte_prefetch_many(vcpu, sp, start, spte); > } Reviewed-by: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > > static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) > -- > 2.33.0.rc1.237.g0d66db33f3-goog >