Re: [PATCH 0/2] KVM: x86/mmu: .change_pte() optimization in TDP MMU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 16, 2023 at 11:18:03AM -0700, Sean Christopherson wrote:
> On Tue, Aug 08, 2023, Yan Zhao wrote:
> > This series optmizes KVM mmu notifier.change_pte() handler in x86 TDP MMU
> > (i.e. kvm_tdp_mmu_set_spte_gfn()) by removing old dead code and prefetching
> > notified new PFN into SPTEs directly in the handler.
> > 
> > As in [1], .change_pte() has been dead code on x86 for 10+ years.
> > Patch 1 drops the dead code in x86 TDP MMU to save cpu cycles and prepare
> > for optimization in TDP MMU in patch 2.
> 
> If we're going to officially kill the long-dead attempt at optimizing KSM, I'd
> strongly prefer to rip out .change_pte() entirely, i.e. kill it off in all
> architectures and remove it from mmu_notifiers.  The only reason I haven't proposed
> such patches is because I didn't want to it to backfire and lead to someone trying
> to resurrect the optimizations for KSM.
> 
> > Patch 2 optimizes TDP MMU's .change_pte() handler to prefetch SPTEs in the
> > handler directly with PFN info contained in .change_pte() to avoid that
> > each vCPU write that triggers .change_pte() must undergo twice VMExits and
> > TDP page faults.
> 
> IMO, prefaulting guest memory as writable is better handled by userspace, e.g. by
> using QEMU's prealloc option.  It's more coarse grained, but at a minimum it's
> sufficient for improving guest boot time, e.g. by preallocating memory below 4GiB.
> 
> And we can do even better, e.g. by providing a KVM ioctl() to allow userspace to
> prefault memory not just into the primary MMU, but also into KVM's MMU.  Such an
> ioctl() is basically manadatory for TDX, we just need to morph the support being
> added by TDX into a generic ioctl()[*]
> 
> Prefaulting guest memory as writable into the primary MMU should be able to achieve
> far better performance than hooking .change_pte(), as it will avoid the mmu_notifier
> invalidation, e.g. won't trigger taking mmu_lock for write and the resulting remote
> TLB flush(es).  And a KVM ioctl() to prefault into KVM's MMU should eliminate page
> fault VM-Exits entirely.
> 
> Explicit prefaulting isn't perfect, but IMO the value added by prefetching in
> .change_pte() isn't enough to justify carrying the hook and the code in KVM.
> 
> [*] https://lore.kernel.org/all/ZMFYhkSPE6Zbp8Ea@xxxxxxxxxx
Hi Sean,
As I didn't write the full picture of patch 2 in the cover letter well,
may I request you to take a look of patch 2 to see if you like it? (in
case if you just read the cover letter).
What I observed is that each vCPU write to a COW page in primary MMU
will lead to twice TDP page faults.
Then, I just update the secondary MMU during the first TDP page fault
to avoid the second one.
It's not a blind prefetch (I checked the vCPU to ensure it's triggered
by a vCPU operation as much as possible) and it can benefit guests who
doesn't explicitly request a prefault memory as write.


Thanks
Yan



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux