Re: [PATCH v15 21/23] KVM: MMU: Disable fast path for private memslots

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 10 May 2024 08:59:09 -0700

On Fri, May 10, 2024, Michael Roth wrote:
> On Fri, May 10, 2024 at 03:50:26PM +0200, Paolo Bonzini wrote:
> > On Fri, May 10, 2024 at 3:47 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > > +      * Since software-protected VMs don't have a notion of a shared vs.
> > > > +      * private that's separate from what KVM is tracking, the above
> > > > +      * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the
> > > > +      * special handling for that case for now.
> > >
> > > Very technically, it can occur if userspace _just_ modified the attributes.  And
> > > as I've said multiple times, at least for now, I want to avoid special casing
> > > SW-protected VMs unless it is *absolutely* necessary, because their sole purpose
> > > is to allow testing flows that are impossible to excercise without SNP/TDX hardware.
> > 
> > Yep, it is not like they have to be optimized.
> 
> Ok, I thought there were maybe some future plans to use sw-protected VMs
> to get some added protections from userspace. But even then there'd
> probably still be extra considerations for how to handle access tracking
> so white-listing them probably isn't right anyway.
> 
> I was also partly tempted to take this route because it would cover this
> TDX patch as well:
> 
>   https://lore.kernel.org/lkml/91c797997b57056224571e22362321a23947172f.1705965635.git.isaku.yamahata@xxxxxxxxx/

Hmm, I'm pretty sure that patch is trying to fix the exact same issue you are
fixing, just in a less precise way.  S-EPT entries only support RWX=0 and RWX=111b,
i.e. it should be impossible to have a write-fault to a present S-EPT entry.

And if TDX is running afoul of this code:

	if (!fault->present)
		return !kvm_ad_enabled();

then KVM should do the sane thing and require A/D support be enabled for TDX.

And if it's something else entirely, that changelog has some explaining to do.

> and avoid any weirdness about checking kvm_mem_is_private() without
> checking mmu_invalidate_seq, but I think those cases all end up
> resolving themselves eventually and added some comments around that.

Yep, checking state that is protected by mmu_invalidate_seq outside of mmu_lock
is definitely allowed, e.g. the entire fast page fault path operates outside of
mmu_lock and thus outside of mmu_invalidate_seq's purview.

It's a-ok because the SPTE are done with an atomic CMPXCHG, and so KVM only needs
to ensure its page tables aren't outright _freed_.  If the zap triggered by the
attributes change "wins", then the fast #PF path will fail the CMPXCHG and be an
expensive NOP.  If the fast #PF wins, the zap will pave over the fast #PF fix,
and the IPI+flush that is needed for all zaps, to ensure vCPUs don't have stale
references, does the rest.

And if there's an attributes race that causes the fast #PF to bail early, the vCPU
will see the correct state on the next page fault.