Re: [PATCH v11 018/113] KVM: TDX: create/destroy VM structure

David Matlack <dmatlack@xxxxxxxxxx> · Fri, 20 Jan 2023 14:21:31 -0800

On Thu, Jan 19, 2023 at 4:16 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Jan 19, 2023, Huang, Kai wrote:
> > On Thu, 2023-01-19 at 21:36 +0000, Sean Christopherson wrote:
> > > The least invasive idea I have is expand the TDP MMU's concept of "frozen" SPTEs
> > > and freeze (a.k.a. lock) the SPTE (KVM's mirror) until the corresponding S-EPT
> > > update completes.
> >
> > This will introduce another "having-to-wait while SPTE is frozen" problem I
> > think, which IIUC means (one way is) you have to do some loop and retry, perhaps
> > similar to yield_safe.
>
> Yes, but because the TDP MMU already freezes SPTEs (just for a shorter duration),
> I'm 99% sure all of the affected flows already know how to yield/bail when necessary.
>
> The problem with the zero-step mitigation is that it could (theoretically) cause
> a "busy" error on literally any accesses, which makes it infeasible for KVM to have
> sane behavior.  E.g. freezing SPTEs to avoid the ordering issues isn't necessary
> when holding mmu_lock for write, whereas the zero-step madness brings everything
> into play.

(I'm still ramping up on TDX so apologies in advance if the following
is totally off base.)

The complexity, and to a lesser extent the memory overhead, of
mirroring Secure EPT tables with the TDP MMU makes me wonder if it is
really worth it. Especially since the initial TDX support has so many
constraints that would seem to allow a simpler implementation: all
private memory is pinned, no live migration support, no test/clear
young notifiers, etc.

For the initial version of KVM TDX support, what if we implemented the
Secure EPT management entirely off to the side? i.e. Not on top of the
TDP MMU. For example, write TDX-specific routines for:

 - Fully populating the Secure EPT tree some time during VM creation.
 - Tearing down the Secure EPT tree during VM destruction.
 - Support for unmapping/mapping specific regions of the Secure EPT
tree for private<->shared conversions.

With that in place, KVM never would need to handle a fault on a Secure
EPT mapping. Any fault (e.g. due to an in-progress private<->shared
conversion) can just return back to the guest to retry the memory
access until the operation is complete.

If we start with only supporting 4K pages in the Secure EPT, the
Secure EPT routines described above would be almost trivial to
implement. Huge Pages would add some complexity, but I don't think it
would be terrible. Concurrency can be handled with a single lock since
we don't have to worry about concurrent faulting.

This would avoid having TDX add a bunch of complexity to the TDP MMU
(which would only be used for shared mappings). If and when we want to
have more complicated memory management for TDX private mappings, we
could revisit TDP MMU integration. But I think this design could even
get us to the point of supporting Dirty Logging (where the only fault
KVM would have to handle for TDX private mappings would be
write-protection faults). I'm not sure it would work for Demand-Paging
(at least the performance would not be great behind a single lock),
but we can cross that bridge when we get there.