Re: [PATCH RFC v7 04/64] KVM: x86: Add 'fault_is_private' x86 op

Michael Roth <michael.roth@xxxxxxx> · Mon, 20 Feb 2023 10:22:10 -0600

On Fri, Jan 13, 2023 at 03:48:59PM +0000, Sean Christopherson wrote:
> On Fri, Jan 13, 2023, Borislav Petkov wrote:
> > On Wed, Jan 04, 2023 at 08:42:56PM -0600, Michael Roth wrote:
> > > Obviously I need to add some proper documentation for this, but a 1
> > > return basically means 'private_fault' pass-by-ref arg has been set
> > > with the appropriate value, whereas 0 means "there's no platform-specific
> > > handling for this, so if you have some generic way to determine this
> > > then use that instead".
> > 
> > Still binary, tho, and can be bool, right?
> > 
> > I.e., you can just as well do:
> > 
> >         if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
> >                 goto out;
> > 
> > at the call site.
> 
> Ya.  Don't spend too much time trying to make this look super pretty though, there
> are subtle bugs inherited from the base UPM series that need to be sorted out and
> will impact this code.  E.g. invoking kvm_mem_is_private() outside of the protection
> of mmu_invalidate_seq means changes to the attributes may not be reflected in the
> page tables.
> 
> I'm also hoping we can avoid a callback entirely, though that may prove to be
> more pain than gain.  I'm poking at the UPM and testing series right now, will
> circle back to this and TDX in a few weeks to see if there's a sane way to communicate
> shared vs. private without having to resort to a callback, and without having
> races between page faults, KVM_SET_MEMORY_ATTRIBUTES, and KVM_SET_USER_MEMORY_REGION2.

Can circle back on this, but for v8 at least I've kept the callback, but
simplified SVM implementation of it so that it's only needed for SNP. For
protected-SEV it will fall through to the same generic handling used by UPM
self-tests.

It seems like it's safe to have a callback of that sort here for TDX/SNP (or
whatever we end up replacing the callback with), since the #NPF flags
themselves won't change based on attribute updates, and the subsequent
comparison to kvm_mem_is_private() will happen after mmu_invalidate_seq
is logged.

But for protected-SEV and UPM selftests the initial kvm_mem_is_private()
can become stale vs. the one in __kvm_faultin_pfn(), but it seems like ATM
it would only lead to a spurious KVM_EXIT_MEMORY_FAULT, which SEV at least
should treat at an implicit page-state change and be able to recover from.
But yah, not ideal, and maybe for self-tests that makes it difficult to tell
if things are working as expected or not.

Maybe we should just skip setting fault->is_private here in the
non-TDX/non-SNP cases, and just have some other indicator so it's
initialized/ignored in kvm_mem_is_private() later. I think some iterations
of UPM did it this way prior to 'is_private' becoming const.

> 
> > > This is mainly to handle CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, which
> > > just parrots whatever kvm_mem_is_private() returns to support running
> > > KVM selftests without needed hardware/platform support. If we don't
> > > take care to skip this check where the above fault_is_private() hook
> > > returns 1, then it ends up breaking SNP in cases where the kernel has
> > > been compiled with CONFIG_HAVE_KVM_PRIVATE_MEM_TESTING, since SNP
> > > relies on the page fault flags to make this determination, not
> > > kvm_mem_is_private(), which normally only tracks the memory attributes
> > > set by userspace via KVM_SET_MEMORY_ATTRIBUTES ioctl.
> > 
> > Some of that explanation belongs into the commit message, which is a bit
> > lacking...
> 
> I'll circle back to this too when I give this series (and TDX) a proper look,
> there's got too be a better way to handle this.
> 

It seems like for SNP/TDX we just need to register the shared/encrypted
bits with KVM MMU and let it handle checking the #NPF flags, but can
iterate on that for the next spin when we have a better idea what it
should look like.

-Mike