On Mon, 2024-12-16 at 17:53 -0800, Sean Christopherson wrote: > Every new feature that lands in hardware needs to either be "benign" or have the > appropriate virtualization controls. KVM already has to deal with cases where > features can effectively be used without KVM's knowledge. E.g. there are plenty > of instruction-level virtualization holes, and SEV-ES doubled down by essentially > forcing KVM to let the guest write XCR0 and XSS directly. We discussed this in the PUCK call. It turns out there were two different ideas on how this fixed bit API would be used. One is to help userspace understand which configurations are possible. For this one, I'm not sure how helpful this proposal will be in the long run. I'll respond on the other branch of the thread. The other usage people were thinking of, which I didn't realize before, was to prevent the TDX module from setting fixed bits that might require VMMs support (i.e. save/restoring something that could affect the host). The rest of the mail is about this issue. Due to the steps involved in resolving this confusion, and that we didn't really reach a conclusion, the discussion is hard to summarize. So instead I'll try to re-kick it off with an idea which has bits and pieces of what people said... I think we can't have the TDX module setting new fixed bits that require any VMM enabling. When we finally have settled upstream TDX support, the TDX module needs to understand what things KVM relies on so it doesn't break them with updates. But new fixed CPUID bits that require VMM enabling to prevent host issues seems like the kind of thing in general that just shouldn't happen. As for new configurable bits that require VMM enabling. Adrian was suggesting that the TDX module currently only has two guest CPUID bits that are problematic for KVM today (and the next vcpu enter/exit series has a patch to forbid them). But a re-check of this assertion is warranted. It seems like an anti-pattern to have KVM maintaining any code to defend against TDX module changes that could instead be handled with a promise. However, KVM having code to defend against userspace prodding the TDX module to do something bad to the host seems valid. So fixed bit issues should be handled with a promise, but issues related to new configurable bits seems open. Some options discussed on the call: 1. If we got a promise to require any new CPUID bits that clobber host state to require an opt-in (attributes bit, etc) then we could get by with a promise for that too. The current situation was basically to assume TDX module wouldn't open up the issue with new CPUID bits (only attributes/xfam). 2. If we required any new configurable CPUID bits to save/restore host state automatically then we could also get by, but then KVM's code that does host save/restore would either be redundant or need a TDX branch. 3. If we prevent setting any CPUID bits not supported by KVM, we would need to track these bits in KVM. The data backing GET_SUPPORTED_CPUID is not sufficient for this purpose since it is actually more like "default values" then a mask of supported bits. A patch to try to do this filtering was dropped after upstream discussion.[0] Other idea ---------- Previously we tried to maintain an allow list of KVM supported configurable bits [0]. It was do-able, but not ideal. It would be smaller for KVM to protect itself with a deny list of bits, or rather a list of bits that needs to be in KVM_GET_SUPPORTED_CPUID, or they should not be allowed to be configured. But KVM can't keep a list of bits that it doesn't know about. But the TDX module does know which bits that it supports result in host state getting clobbered. So we could ask TDX module to expose a list of bits that have an effect on host state. We could check those against KVM_GET_SUPPORTED_CPUID. That check could be expected to fit better than when we tried to massage KVM_GET_SUPPORTED_CPUID to be a mask that includes all possible configurable bits (multi-bit fields, etc). In the meantime we could keep a list of all of today's host affecting bits. TDX module would need to gate any new bits that effect host state behind a new sys- wide opt-in that comes with the "clobber bits" metadata. Before entering a TD, KVM would check the clobber bits in KVM's copy of CPUID against the TD's copy to make sure everyone knows what they have to do. (and also this opt-in stuff would need to be run by the TDX module team of course) It leaves open the possibility that there is some other bits KVM cares about that don't have to do with clobbering host state. Not sure about it. [0] https://lore.kernel.org/kvm/20240812224820.34826-26-rick.p.edgecombe@xxxxxxxxx/