Re: [PATCH 0/3] KVM: x86: SGX vs. XCR0 cleanups

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 12 Apr 2023 07:57:02 -0700

On Wed, Apr 12, 2023, Kai Huang wrote:
> On Wed, 2023-04-05 at 19:10 -0700, Sean Christopherson wrote:
> > On Wed, Apr 05, 2023, Huang, Kai wrote:
> > > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote:
> > > > *** WARNING *** ABI breakage.
> > > > 
> > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0)
> > > > for SGX enclaves.  Past me didn't understand the roles and responsibilities
> > > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was
> > > > being helpful by having KVM adjust the entries.
> > > 
> > > Actually I am not clear about this topic.
> > > 
> > > So the rule is KVM should never adjust CPUID entries passed from userspace?
> > 
> > Yes, except for true runtime entries where a CPUID leaf is dynamic based on other
> > CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc.
> > 
> > > What if the userspace passed the incorrect CPUID entries?  Should KVM sanitize
> > > those CPUID entries to ensure there's no insane configuration?  My concern is if
> > > we allow guest to be created with insane CPUID configurations, the guest can be
> > > confused and behaviour unexpectedly.
> > 
> > It is userspace's responsibility to provide a sane, correct setup.  The one
> > exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an
> > unsupported virtual address width, the argument being that a malicious userspace
> > could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a
> > VMCS field.
> 
> Sorry could you elaborate an example of such attack? :)

Hrm, I was going to say that userspace could shove a noncanonical address in
MSR_FS/GS_BASE and trigger an unexpected VM-Fail (VMX) or ??? behavior on VMLOAD
(I don't think SVM consistency checks FS/GS.base).  But is_noncanonical_address()
queries CR4.LA57, not the address width from CPUID.0x80000008, which makes sense
enumearing 57 bits of virtual address space on a CPU without LA57 would also allow
shoving a bad value into hardware.

So even that example is bogus, i.e. commit dd598091de4a ("KVM: x86: Warn if guest
virtual address space is not 48-bits") really shouldn't have gone in.

> > The reason for KVM punting to userspace is that it's all but impossible to define
> > what is/isn't sane.  A really good example would be an alternative we (Google)
> > considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that
> > migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly
> > miss reserved bit #PFs.
> > 
> > Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered
> > was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in
> > anticipation of eventual migration.  So long as userspace doesn't actually enumerate
> > memslots in the illegal address space, KVM would be able to treat such accesses as
> > emulated MMIO, and would only need to intercept #PF(RSVD).
> > 
> > Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR
> > definitely qualifies as insane since it really can't work correctly, but in our
> > opinion it was far superior to running with allow_smaller_maxphyaddr=true.
> 
> I guess everyone wants performance.

Performance was a secondary concern, functional correctness was the main issue.
We were concerned that KVM would end up terminating healthy/sane guests due to
KVM's emulator being incomplete, i.e. if KVM failed to emulate an instruction in
the EPT violation handler when GPA > guest.MAXPHYADDR.  That, and SVM sets the
Accessed bit in the guest PTE before the NPT exit, i.e. KVM can't emulate a
smaller guest.MAXPHYADDR without creating an architectural violation from the
guest's perspective (a PTE with reserved bits should never set A/D bits).