On Wed, Aug 14, 2024, Chao Gao wrote: > On Tue, Aug 13, 2024 at 06:16:10PM -0700, Sean Christopherson wrote: > >On Wed, Aug 14, 2024, Chao Gao wrote: > >> On Tue, Aug 13, 2024 at 11:14:31PM +0800, Xiaoyao Li wrote: > >> >On 8/13/2024 7:34 PM, Chao Gao wrote: > >> >> I think adding new fixed-1 bits is fine as long as they don't break KVM, i.e., > >> >> KVM shouldn't need to take any action for the new fixed-1 bits, like > >> >> saving/restoring more host CPU states across TD-enter/exit or emulating > >> >> CPUID/MSR accesses from guests > >> > > >> >I disagree. Adding new fixed-1 bits in a newer TDX module can lead to a > >> >different TD with same cpu model. > >> > >> The new TDX module simply doesn't support old CPU models. > > > >What happens if the new TDX module is needed to fix a security issue? Or if a > >customer wants to support a heterogenous migration pool, and older (physical) > >CPUs don't support the feature? Or if a customer wants to continue hosting > >existing VM shapes on newer hardware? > > > >> QEMU can report an error and define a new CPU model that works with the TDX > >> module. Sometimes, CPUs may drop features; > > > >Very, very rarely. And when it does happen, there are years of warning before > >the features are dropped. > > > >> this may cause KVM to not support some features and in turn some old CPU > >> models having those features cannot be supported. is it a requirement for > >> TDX modules alone that old CPU models must always be supported? > > > >Not a hard requirement, but a pretty firm one. There needs to be sane, reasonable > >behavior, or we're going to have problems. > > OK. So, the expectation is the TDX module should avoid adding new fixed-1 bits. > > I suppose this also applies to "native" CPUID bits, which are not configurable > and simply reflected as native values to TDs. Yes, unless all of Intel's customers are ok with the effective restriction that the *only* valid vCPU model for a TDX VM is the real underlying CPU model. To me, that seems like a poor bet to make. The cost of allowing feature bits to be flexible isn't _that_ high, versus the potential cost of forcing customers to change how they operate and manage VM shapes, CPU/platform upgrades, etc. Maybe Intel has already had those conversations with product folk and everyone is ok with the restriction, it just seems like very avoidable pain to me. > One scenario where "fixed-1" bits can help is: we discover a security issue and > release a microcode update to expose a feature indicating which CPUs are > vulnerable. if the TDX module allows the VMM to configure the feature as 0 > (i.e., not vulnerable) on vulnerable CPUs, a TD might incorrectly assume it's > not vulnerable, creating a security issue. > > I think in above case, the TDX module has to add a "fixed-1" bit. An example of > such a feature is RRSBA in the IA32_ARCH_CAPABILITIES MSR. That would be fine, I would classify that as reasonable. However, that scenario doesn't really work in practice, at least not the way Intel probably hopes it plays out. For the new fixed-1 bit to provide value, it would require a guest reboot and likely a guets kernel upgrade.