Re: [PATCH v3 22/23] KVM: x86/mmu: Support Eager Page Splitting in the shadow MMU

David Matlack <dmatlack@xxxxxxxxxx> · Thu, 14 Apr 2022 09:50:35 -0700

On Fri, Apr 8, 2022 at 5:40 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Fri, Apr 01, 2022, David Matlack wrote:
> > Add support for Eager Page Splitting pages that are mapped by the shadow
> > MMU. Walk through the rmap first splitting all 1GiB pages to 2MiB pages,
> > and then splitting all 2MiB pages to 4KiB pages.
> >
> > Splitting huge pages mapped by the shadow MMU requries dealing with some
> > extra complexity beyond that of the TDP MMU:
> >
> > (1) The shadow MMU has a limit on the number of shadow pages that are
> >     allowed to be allocated. So, as a policy, Eager Page Splitting
> >     refuses to split if there are KVM_MIN_FREE_MMU_PAGES or fewer
> >     pages available.
> >
> > (2) Huge pages may be mapped by indirect shadow pages which have the
> >     possibility of being unsync. As a policy we opt not to split such
> >     pages as their translation may no longer be valid.
>
> This shouldn't be possible, shadow pages whose role is > 4k are always write-protected
> and not allowed to become unsync.

Ah ok, then the unsync check is unnecessary (or at least could WARN_ON()).

>
> >
> > (3) Splitting a huge page may end up re-using an existing lower level
> >     shadow page tables. This is unlike the TDP MMU which always allocates
> >     new shadow page tables when splitting.
>
> ...
>
> > +static void kvm_mmu_split_huge_page(struct kvm *kvm,
> > +                                 const struct kvm_memory_slot *slot,
> > +                                 u64 *huge_sptep, struct kvm_mmu_page **spp)
> > +
> > +{
> > +     struct kvm_mmu_memory_cache *cache = &kvm->arch.huge_page_split_desc_cache;
> > +     u64 huge_spte = READ_ONCE(*huge_sptep);
> > +     struct kvm_mmu_page *sp;
> > +     bool flush = false;
> > +     u64 *sptep, spte;
> > +     gfn_t gfn;
> > +     int index;
> > +
> > +     sp = kvm_mmu_get_sp_for_split(kvm, slot, huge_sptep, spp);
> > +
> > +     for (index = 0; index < PT64_ENT_PER_PAGE; index++) {
> > +             sptep = &sp->spt[index];
> > +             gfn = kvm_mmu_page_get_gfn(sp, index);
> > +
> > +             /*
> > +              * sp may have populated page table entries, e.g. if this huge
> > +              * page is aliased by multiple sptes with the same access
> > +              * permissions. We know the sptes will be mapping the same
> > +              * gfn-to-pfn translation since sp is direct. However, a given
> > +              * spte may point to an even lower level page table. We don't
> > +              * know if that lower level page table is completely filled in,
> > +              * i.e. we may be effectively unmapping a region of memory, so
> > +              * we must flush the TLB.
>
> Random side topic, please avoid "we" and other pronouns in comments and changelogs,
> it gets real easy to lose track of what a pronoun is referring to, especially in
> changelogs where "we" might be KVM, might be the kernel, might be the team that's
> using the patch, might be an author that's prone to illeism, etc...

Agreed. It's a bad habit of mine, and despite conscious effort to
limit use of "we" in my comments and change logs, it inevitably creeps
in. I'll do a pass on this series to get rid of the use of "we"
throughout.