On Wed, Apr 12, 2023, Kai Huang wrote: > On Wed, 2023-04-05 at 19:10 -0700, Sean Christopherson wrote: > > On Wed, Apr 05, 2023, Huang, Kai wrote: > > > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > > > > *** WARNING *** ABI breakage. > > > > > > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > > > > for SGX enclaves. Past me didn't understand the roles and responsibilities > > > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > > > > being helpful by having KVM adjust the entries. > > > > > > Actually I am not clear about this topic. > > > > > > So the rule is KVM should never adjust CPUID entries passed from userspace? > > > > Yes, except for true runtime entries where a CPUID leaf is dynamic based on other > > CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc. > > > > > What if the userspace passed the incorrect CPUID entries? Should KVM sanitize > > > those CPUID entries to ensure there's no insane configuration? My concern is if > > > we allow guest to be created with insane CPUID configurations, the guest can be > > > confused and behaviour unexpectedly. > > > > It is userspace's responsibility to provide a sane, correct setup. The one > > exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an > > unsupported virtual address width, the argument being that a malicious userspace > > could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a > > VMCS field. > > Sorry could you elaborate an example of such attack? :) Hrm, I was going to say that userspace could shove a noncanonical address in MSR_FS/GS_BASE and trigger an unexpected VM-Fail (VMX) or ??? behavior on VMLOAD (I don't think SVM consistency checks FS/GS.base). But is_noncanonical_address() queries CR4.LA57, not the address width from CPUID.0x80000008, which makes sense enumearing 57 bits of virtual address space on a CPU without LA57 would also allow shoving a bad value into hardware. So even that example is bogus, i.e. commit dd598091de4a ("KVM: x86: Warn if guest virtual address space is not 48-bits") really shouldn't have gone in. > > The reason for KVM punting to userspace is that it's all but impossible to define > > what is/isn't sane. A really good example would be an alternative we (Google) > > considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that > > migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly > > miss reserved bit #PFs. > > > > Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered > > was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in > > anticipation of eventual migration. So long as userspace doesn't actually enumerate > > memslots in the illegal address space, KVM would be able to treat such accesses as > > emulated MMIO, and would only need to intercept #PF(RSVD). > > > > Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR > > definitely qualifies as insane since it really can't work correctly, but in our > > opinion it was far superior to running with allow_smaller_maxphyaddr=true. > > I guess everyone wants performance. Performance was a secondary concern, functional correctness was the main issue. We were concerned that KVM would end up terminating healthy/sane guests due to KVM's emulator being incomplete, i.e. if KVM failed to emulate an instruction in the EPT violation handler when GPA > guest.MAXPHYADDR. That, and SVM sets the Accessed bit in the guest PTE before the NPT exit, i.e. KVM can't emulate a smaller guest.MAXPHYADDR without creating an architectural violation from the guest's perspective (a PTE with reserved bits should never set A/D bits).