Re: [PATCH 0/3] KVM: x86: SGX vs. XCR0 cleanups

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 5 Apr 2023 19:10:40 -0700

On Wed, Apr 05, 2023, Huang, Kai wrote:
> On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote:
> > *** WARNING *** ABI breakage.
> > 
> > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0)
> > for SGX enclaves.  Past me didn't understand the roles and responsibilities
> > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was
> > being helpful by having KVM adjust the entries.
> 
> Actually I am not clear about this topic.
> 
> So the rule is KVM should never adjust CPUID entries passed from userspace?

Yes, except for true runtime entries where a CPUID leaf is dynamic based on other
CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc.

> What if the userspace passed the incorrect CPUID entries?  Should KVM sanitize
> those CPUID entries to ensure there's no insane configuration?  My concern is if
> we allow guest to be created with insane CPUID configurations, the guest can be
> confused and behaviour unexpectedly.

It is userspace's responsibility to provide a sane, correct setup.  The one
exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an
unsupported virtual address width, the argument being that a malicious userspace
could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a
VMCS field.

The reason for KVM punting to userspace is that it's all but impossible to define
what is/isn't sane.  A really good example would be an alternative we (Google)
considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that
migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly
miss reserved bit #PFs.

Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered
was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in
anticipation of eventual migration.  So long as userspace doesn't actually enumerate
memslots in the illegal address space, KVM would be able to treat such accesses as
emulated MMIO, and would only need to intercept #PF(RSVD).

Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR
definitely qualifies as insane since it really can't work correctly, but in our
opinion it was far superior to running with allow_smaller_maxphyaddr=true.

And sane is not the same thing as architecturally legal.  AMX is a good example
of this.  It's _technically_ legal to enumerate support for XFEATURE_TILE_CFG but
not XFEATURE_TILE_DATA in CPUID, but illegal to actually try to enable TILE_CFG
in XCR0 without also enabling TILE_DATA.  KVM should arguably reject CPUID configs
with TILE_CFG but not TILE_DATA, and vice versa, but then KVM is rejecting a 100%
architecturally valid, if insane, CPUID configuration.  Ditto for nearly all of
the VMX control bits versus their CPUID counterparts.

And sometimes there are good reasons to run a VM with a truly insane configuration,
e.g. for testing purposes.

TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms.