Re: [PATCH 09/54] KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 23 Jun 2021 22:04:39 +0000

On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 23/06/21 17:08, Sean Christopherson wrote:
> > Because the shadow page's level is incorporated into its role, if the level of
> > the new page is >4k, the branch at (1) will be taken for all 4k shadow pages.
> > 
> > Maybe something like this for a comment?
> 
> Good, integrated.
> 
> Though I also wonder why breaking out of the loop early is okay.  Initially I thought
> that zapping only matters if there's no existing page with the desired role,
> because otherwise the unsync page would have been zapped already by an earlier
> kvm_get_mmu_page, but what if the page was synced at the time of kvm_get_mmu_page
> and then both were unsynced?

That can't happen, because the new >4k SP will mark the page for write-tracking
via account_shadowed(), and any attempt to unsync the page will fail and
write-protect the entry.

It would be possible have both an unsync and a sync SP, e.g. unsync, then INVLPG
only one of the pages.  But as you pointed out, creating the first >4k SP would
be guaranteed to wipe out the unsync SP because no match should exist.

> It may be easier to just split the loop to avoid that additional confusion,
> something like:
> 
>         /*
>          * If the guest is creating an upper-level page, zap unsync pages
>          * for the same gfn, because the gfn will be write protected and
>          * future syncs of those unsync pages could happen with an incompatible
>          * context.

I don't think the part about "future syncs ... with an incompatible context" is
correct.  The unsync walks, i.e. the sync() flows, are done with the current root
and it should be impossible to reach a SP with an invalid context when walking
the child SPs.

I also can't find anything that would out break if the SP were left unsync, i.e.
I haven't found any code that assumes a write-protected SP can't be unsync.
E.g. mmu_try_to_unsync_pages() will force write-protection due to write tracking
even if unsync is left true.  Maybe there was a rule/assumption at some point
that has since gone away?  That's why my comment hedged and just said "don't
do it" without explaining why :-)

All that said, I'm definitely not opposed to simplying/clarifying the code and
ensuring all unsync SPs are zapped in this case.

>	   * While it's possible the guest is using recursive page
>          * tables, in all likelihood the guest has stopped using the unsync
>          * page and is installing a completely unrelated page.
>          */
>         if (level > PG_LEVEL_4K) {

I believe this can be "if (!direct && level > PG_LEVEL_4K)", because the direct
case won't write protect/track anything.

>                 for_each_valid_sp(vcpu->kvm, sp, sp_list)

This can technically be "for_each_gfn_indirect_valid_sp", though I'm not sure it
saves much, if anything.

>                         if (sp->gfn == gfn && sp->role.word != role.word && sp->unsync)
>                                 kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
>                                                          &invalid_list);
>         }
> 
> Paolo
>