On Wed, 2023-04-05 at 19:10 -0700, Sean Christopherson wrote: > On Wed, Apr 05, 2023, Huang, Kai wrote: > > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > > > *** WARNING *** ABI breakage. > > > > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > > > for SGX enclaves. Past me didn't understand the roles and responsibilities > > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > > > being helpful by having KVM adjust the entries. > > > > Actually I am not clear about this topic. > > > > So the rule is KVM should never adjust CPUID entries passed from userspace? > > Yes, except for true runtime entries where a CPUID leaf is dynamic based on other > CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc. > > > What if the userspace passed the incorrect CPUID entries? Should KVM sanitize > > those CPUID entries to ensure there's no insane configuration? My concern is if > > we allow guest to be created with insane CPUID configurations, the guest can be > > confused and behaviour unexpectedly. > > It is userspace's responsibility to provide a sane, correct setup. The one > exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an > unsupported virtual address width, the argument being that a malicious userspace > could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a > VMCS field. Sorry could you elaborate an example of such attack? :) > > The reason for KVM punting to userspace is that it's all but impossible to define > what is/isn't sane. A really good example would be an alternative we (Google) > considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that > migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly > miss reserved bit #PFs. > > Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered > was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in > anticipation of eventual migration. So long as userspace doesn't actually enumerate > memslots in the illegal address space, KVM would be able to treat such accesses as > emulated MMIO, and would only need to intercept #PF(RSVD). > > Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR > definitely qualifies as insane since it really can't work correctly, but in our > opinion it was far superior to running with allow_smaller_maxphyaddr=true. I guess everyone wants performance. > > And sane is not the same thing as architecturally legal. AMX is a good example > of this. It's _technically_ legal to enumerate support for XFEATURE_TILE_CFG but > not XFEATURE_TILE_DATA in CPUID, but illegal to actually try to enable TILE_CFG > in XCR0 without also enabling TILE_DATA. KVM should arguably reject CPUID configs > with TILE_CFG but not TILE_DATA, and vice versa, but then KVM is rejecting a 100% > architecturally valid, if insane, CPUID configuration. Ditto for nearly all of > the VMX control bits versus their CPUID counterparts. > > And sometimes there are good reasons to run a VM with a truly insane configuration, > e.g. for testing purposes. > > TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms. Agreed. Thanks for the clarification.