Re: [RFC PATCH 13/15] KVM: x86/mmu: Split large pages during CLEAR_DIRTY_LOG

David Matlack <dmatlack@xxxxxxxxxx> · Tue, 30 Nov 2021 16:16:04 -0800

On Fri, Nov 26, 2021 at 4:17 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Fri, Nov 19, 2021 at 11:57:57PM +0000, David Matlack wrote:
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 6768ef9c0891..4e78ef2dd352 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -1448,6 +1448,12 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
> >               gfn_t start = slot->base_gfn + gfn_offset + __ffs(mask);
> >               gfn_t end = slot->base_gfn + gfn_offset + __fls(mask);
> >
> > +             /*
> > +              * Try to proactively split any large pages down to 4KB so that
> > +              * vCPUs don't have to take write-protection faults.
> > +              */
> > +             kvm_mmu_try_split_large_pages(kvm, slot, start, end, PG_LEVEL_4K);
> > +
> >               kvm_mmu_slot_gfn_write_protect(kvm, slot, start, PG_LEVEL_2M);
> >
> >               /* Cross two large pages? */
>
> Is it intended to try split every time even if we could have split it already?
> As I remember Paolo mentioned that we can skip split if it's not the 1st
> CLEAR_LOG on the same range, and IIUC that makes sense.
>
> But indeed I don't see a trivial way to know whether this is the first clear of
> this range.  Maybe we can maintain "how many huge pages are there under current
> kvm_mmu_page node" somehow?  Then if root sp has the counter==0, then we can
> skip it.  Just a wild idea..
>
> Or maybe it's intended to try split unconditionally for some reason?  If so
> it'll be great to mention that either in the commit message or in comments.

Thanks for calling this out. Could the same be said about the existing
code that unconditionally tries to write-protect 2M+ pages? I aimed to
keep parity with the write-protection calls (always try to split
before write-protecting) but I agree there might be opportunities
available to skip altogether.

By the way, looking at this code again I think I see some potential bugs:
 - I don't think I ever free split_caches in the initially-all-set case.
 - What happens if splitting fails the CLEAR_LOG but succeeds the
CLEAR_LOG? We would end up propagating the write-protection on the 2M
page down to the 4K page. This might cause issues if using PML.

>
> Thanks,
>
> --
> Peter Xu
>