Re: [PATCH 0/3] KVM: x86: SGX vs. XCR0 cleanups

"Huang, Kai" <kai.huang@xxxxxxxxx> · Wed, 12 Apr 2023 12:15:40 +0000

On Wed, 2023-04-05 at 19:10 -0700, Sean Christopherson wrote:
> On Wed, Apr 05, 2023, Huang, Kai wrote:
> > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote:
> > > *** WARNING *** ABI breakage.
> > > 
> > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0)
> > > for SGX enclaves.  Past me didn't understand the roles and responsibilities
> > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was
> > > being helpful by having KVM adjust the entries.
> > 
> > Actually I am not clear about this topic.
> > 
> > So the rule is KVM should never adjust CPUID entries passed from userspace?
> 
> Yes, except for true runtime entries where a CPUID leaf is dynamic based on other
> CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc.
> 
> > What if the userspace passed the incorrect CPUID entries?  Should KVM sanitize
> > those CPUID entries to ensure there's no insane configuration?  My concern is if
> > we allow guest to be created with insane CPUID configurations, the guest can be
> > confused and behaviour unexpectedly.
> 
> It is userspace's responsibility to provide a sane, correct setup.  The one
> exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an
> unsupported virtual address width, the argument being that a malicious userspace
> could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a
> VMCS field.

Sorry could you elaborate an example of such attack? :)

> 
> The reason for KVM punting to userspace is that it's all but impossible to define
> what is/isn't sane.  A really good example would be an alternative we (Google)
> considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that
> migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly
> miss reserved bit #PFs.
> 
> Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered
> was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in
> anticipation of eventual migration.  So long as userspace doesn't actually enumerate
> memslots in the illegal address space, KVM would be able to treat such accesses as
> emulated MMIO, and would only need to intercept #PF(RSVD).
> 
> Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR
> definitely qualifies as insane since it really can't work correctly, but in our
> opinion it was far superior to running with allow_smaller_maxphyaddr=true.

I guess everyone wants performance.

> 
> And sane is not the same thing as architecturally legal.  AMX is a good example
> of this.  It's _technically_ legal to enumerate support for XFEATURE_TILE_CFG but
> not XFEATURE_TILE_DATA in CPUID, but illegal to actually try to enable TILE_CFG
> in XCR0 without also enabling TILE_DATA.  KVM should arguably reject CPUID configs
> with TILE_CFG but not TILE_DATA, and vice versa, but then KVM is rejecting a 100%
> architecturally valid, if insane, CPUID configuration.  Ditto for nearly all of
> the VMX control bits versus their CPUID counterparts.
> 
> And sometimes there are good reasons to run a VM with a truly insane configuration,
> e.g. for testing purposes.
> 
> TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms.

Agreed.  Thanks for the clarification.