> On 04/12/2023 10:17, Reshetova, Elena wrote: > >> Check for additional CPUID bits to identify TDX guests running with Trust > >> Domain (TD) partitioning enabled. TD partitioning is like nested virtualization > >> inside the Trust Domain so there is a L1 TD VM(M) and there can be L2 TD > VM(s). > >> > >> In this arrangement we are not guaranteed that the TDX_CPUID_LEAF_ID is > >> visible > >> to Linux running as an L2 TD VM. This is because a majority of TDX facilities > >> are controlled by the L1 VMM and the L2 TDX guest needs to use TD > partitioning > >> aware mechanisms for what's left. So currently such guests do not have > >> X86_FEATURE_TDX_GUEST set. > > > > Back to this concrete patch. Why cannot L1 VMM emulate the correct value of > > the TDX_CPUID_LEAF_ID to L2 VM? It can do this per TDX partitioning arch. > > How do you handle this and other CPUID calls call currently in L1? Per spec, > > all CPUIDs calls from L2 will cause L2 --> L1 exit, so what do you do in L1? > The disclaimer here is that I don't have access to the paravisor (L1) code. But > to the best of my knowledge the L1 handles CPUID calls by calling into the TDX > module, or synthesizing a response itself. TDX_CPUID_LEAF_ID is not provided to > the L2 guest in order to discriminate a guest that is solely responsible for every > TDX mechanism (running at L1) from one running at L2 that has to cooperate > with L1. > More below. OK, so in your case it is a decision of L1 VMM not to set the TDX_CPUID_LEAF_ID to reflect that it is a tdx guest and it is on purpose because you want to drop into a special tdx guest, i.e. partitioned guest. > > > > > Given that you do that simple emulation, you already end up with TDX guest > > code being activated. Next you can check what features you wont be able to > > provide in L1 and create simple emulation calls for the TDG calls that must be > > supported and cannot return error. The biggest TDG call (TDVMCALL) is already > > direct call into L0 VMM, so this part doesn’t require L1 VMM support. > > I don't see anything in the TD-partitioning spec that gives the TDX guest a way > to detect if it's running at L2 or L1, or check whether TDVMCALLs go to L0/L1. > So in any case this requires an extra cpuid call to establish the environment. > Given that, exposing TDX_CPUID_LEAF_ID to the guest doesn't help. Yes, there is nothing like this in spec and it is on purpose, because the idea is that L1 can fully control the environment for L2 and virtualize it in the way it wants. > > I'll give some examples of where the idea of emulating a TDX environment > without attempting L1-L2 cooperation breaks down. > > hlt: if the guest issues a hlt TDVMCALL it goes to L0, but if it issues a classic hlt > it traps to L1. The hlt should definitely go to L1 so that L1 has a chance to do > housekeeping. > > map gpa: say the guest uses MAP_GPA TDVMCALL. This goes to L0, not L1 which > is the actual > entity that needs to have a say in performing the conversion. L1 can't act on the > request > if L0 would forward it because of the CoCo threat model. So L1 and L2 get out of > sync. > The only safe approach is for L2 to use a different mechanism to trap to L1 > explicitly. Interesting, thank you for the examples! What it looks like to me that if we give an ability to L1 VMM to specify what TDVMCALL leaves should go into L0 and which ones should end up in L1, we would actually address your usecase more cleanly without the fragmentation of the tdx guest code. Is it a viable option for you? I do understand that such option doesn’t exist at the moment, but if there is a good usecase, we can argue that it is needed. Best Regards, Elena.