On Thu, Mar 28, 2024 at 09:21:37PM +0800, Xiaoyao Li wrote: >On 3/28/2024 6:17 PM, Chao Gao wrote: >> On Thu, Mar 28, 2024 at 11:40:27AM +0800, Xiaoyao Li wrote: >> > On 3/28/2024 11:04 AM, Edgecombe, Rick P wrote: >> > > On Thu, 2024-03-28 at 09:30 +0800, Xiaoyao Li wrote: >> > > > > The current ABI of KVM_EXIT_X86_RDMSR when TDs are created is nothing. So I don't see how this >> > > > > is >> > > > > any kind of ABI break. If you agree we shouldn't try to support MTRRs, do you have a different >> > > > > exit >> > > > > reason or behavior in mind? >> > > > >> > > > Just return error on TDVMCALL of RDMSR/WRMSR on TD's access of MTRR MSRs. >> > > >> > > MTRR appears to be configured to be type "Fixed" in the TDX module. So the guest could expect to be >> > > able to use it and be surprised by a #GP. >> > > >> > > { >> > > "MSB": "12", >> > > "LSB": "12", >> > > "Field Size": "1", >> > > "Field Name": "MTRR", >> > > "Configuration Details": null, >> > > "Bit or Field Virtualization Type": "Fixed", >> > > "Virtualization Details": "0x1" >> > > }, >> > > >> > > If KVM does not support MTRRs in TDX, then it has to return the error somewhere or pretend to >> > > support it (do nothing but not return an error). Returning an error to the guest would be making up >> > > arch behavior, and to a lesser degree so would ignoring the WRMSR. >> > >> > The root cause is that it's a bad design of TDX to make MTRR fixed1. When >> > guest reads MTRR CPUID as 1 while getting #VE on MTRR MSRs, it already breaks >> > the architectural behavior. (MAC faces the similar issue , MCA is fixed1 as >> >> I won't say #VE on MTRR MSRs breaks anything. Writes to other MSRs (e.g. >> TSC_DEADLINE MSR) also lead to #VE. If KVM can emulate the MSR accesses, #VE >> should be fine. >> >> The problem is: MTRR CPUID feature is fixed 1 while KVM/QEMU doesn't know how >> to virtualize MTRR especially given that KVM cannot control the memory type in >> secure-EPT entries. > >yes, I partly agree on that "#VE on MTRR MSRs breaks anything". #VE is not a >problem, the problem is if the #VE is opt-in or unconditional. >From guest's p.o.v, there is no difference: the guest doesn't know whether a feature is opted in or not. > >For the TSC_DEADLINE_MSR, #VE is opt-in actually. >CPUID(1).EXC[24].TSC_DEADLINE is configurable by VMM. Only when VMM >configures the bit to 1, will the TD guest get #VE. If VMM configures it to >0, TD guest just gets #GP. This is the reasonable design. > >> > well while accessing MCA related MSRs gets #VE. This is why TDX is going to >> > fix them by introducing new feature and make them configurable) >> > >> > > So that is why I lean towards >> > > returning to userspace and giving the VMM the option to ignore it, return an error to the guest or >> > > show an error to the user. >> > >> > "show an error to the user" doesn't help at all. Because user cannot fix it, >> > nor does QEMU. >> >> The key point isn't who can fix/emulate MTRR MSRs. It is just KVM doesn't know >> how to handle this situation and ask userspace for help. >> >> Whether or how userspace can handle the MSR writes isn't KVM's problem. It may be >> better if KVM can tell userspace exactly in which cases KVM will exit to >> userspace. But there is no such an infrastructure. >> >> An example is: in KVM CET series, we find it is complex for KVM instruction >> emulator to emulate control flow instructions when CET is enabled. The >> suggestion is also to punt to userspace (w/o any indication to userspace that >> KVM would do this). > >Please point me to decision of CET? I'm interested in how userspace can help >on that. https://lore.kernel.org/kvm/ZZgsipXoXTKyvCZT@xxxxxxxxxx/ > >> > >> > > If KVM can't support the behavior, better to get an actual error in >> > > userspace than a mysterious guest hang, right? >> > What behavior do you mean? >> > >> > > Outside of what kind of exit it is, do you object to the general plan to punt to userspace? >> > > >> > > Since this is a TDX specific limitation, I guess there is KVM_EXIT_TDX_VMCALL as a general category >> > > of TDVMCALLs that cannot be handled by KVM. >> >> Using KVM_EXIT_TDX_VMCALL looks fine. >> >> We need to explain why MTRR MSRs are handled in this way unlike other MSRs. >> >> It is better if KVM can tell userspace that MTRR virtualization isn't supported >> by KVM for TDs. Then userspace should resolve the conflict between KVM and TDX >> module on MTRR. But to report MTRR as unsupported, we need to make >> GET_SUPPORTED_CPUID a vm-scope ioctl. I am not sure if it is worth the effort. > >My memory is that Sean dislike the vm-scope GET_SUPPORTED_CPUID for TDX when >he was at Intel. Ok. No strong opinion on this.