> > > I think we are lacking background of this usage model and how it works. For > > instance, typically L2 is created by L1, and L1 is responsible for L2's device > > I/O emulation. I don't quite understand how could L0 emulate L2's device I/O? > > > > Can you provide more information? > > Let's differentiate between fast and slow I/O. The whole point of the paravisor in > L1 is to provide device emulation for slow I/O: TPM, RTC, NVRAM, IO-APIC, serial ports. > > But fast I/O is designed to bypass it and go straight to L0. Hyper-V uses paravirtual > vmbus devices for fast I/O (net/block). The vmbus protocol has awareness of page visibility > built-in and uses native (GHCI on TDX, GHCB on SNP) mechanisms for notifications. So once > everything is set up (rings/buffers in swiotlb), the I/O for fast devices does not > involve L1. This is only possible when the VM manages C-bit itself. Yeah that makes sense. Thanks for the info. > > I think the same thing could work for virtio if someone would "enlighten" vring > notification calls (instead of I/O or MMIO instructions). > > > > > > > > > > > > > > > > > > > > Whats missing is the tdx_guest flag is not exposed to userspace in /proc/cpuinfo, > > > > > and as a result dmesg does not currently display: > > > > > "Memory Encryption Features active: Intel TDX". > > > > > > > > > > That's what I set out to correct. > > > > > > > > > > > So far I see that you try to get kernel think that it runs as TDX guest, > > > > > > but not really. This is not very convincing model. > > > > > > > > > > > > > > > > No that's not accurate at all. The kernel is running as a TDX guest so I > > > > > want the kernel to know that. > > > > > > > > > > > > > But it isn't. It runs on a hypervisor which is a TDX guest, but this doesn't > > > > make itself a TDX guest.> > > > > > > That depends on your definition of "TDX guest". The TDX 1.5 TD partitioning spec > > > talks of TDX-enlightened L1 VMM, (optionally) TDX-enlightened L2 VM and Unmodified > > > Legacy L2 VM. Here we're dealing with a TDX-enlightened L2 VM. > > > > > > If a guest runs inside an Intel TDX protected TD, is aware of memory encryption and > > > issues TDVMCALLs - to me that makes it a TDX guest. > > > > The thing I don't quite understand is what enlightenment(s) requires L2 to issue > > TDVMCALL and know "encryption bit". > > > > The reason that I can think of is: > > > > If device I/O emulation of L2 is done by L0 then I guess it's reasonable to make > > L2 aware of the "encryption bit" because L0 can only write emulated data to > > shared buffer. The shared buffer must be initially converted by the L2 by using > > MAP_GPA TDVMCALL to L0 (to zap private pages in S-EPT etc), and L2 needs to know > > the "encryption bit" to set up its page table properly. L1 must be aware of > > such private <-> shared conversion too to setup page table properly so L1 must > > also be notified. > > Your description is correct, except that L2 uses a hypercall (hv_mark_gpa_visibility()) > to notify L1 and L1 issues the MAP_GPA TDVMCALL to L0. In TDX partitioning IIUC L1 and L2 use different secure-EPT page table when mapping GPA of L1 and L2. Therefore IIUC entries of both secure-EPT table which map to the "to be converted page" need to be zapped. I am not entirely sure whether using hv_mark_gpa_visibility() is suffice? As if the MAP_GPA was from L1 then I am not sure L0 is easy to zap secure-EPT entry for L2. But anyway these are details probably we don't need to consider. > > C-bit awareness is necessary to setup the whole swiotlb pool to be host visible for > DMA. Agreed. > > > > > The concern I am having is whether there's other usage model(s) that we need to > > consider. For instance, running both unmodified L2 and enlightened L2. Or some > > L2 only needs TDVMCALL enlightenment but no "encryption bit". > > > > Presumably unmodified L2 and enlightened L2 are already covered by current code but > require excessive trapping to L1. > > I can't see a usecase for TDVMCALLs but no "encryption bit". > > > In other words, that seems pretty much L1 hypervisor/paravisor implementation > > specific. I am wondering whether we can completely hide the enlightenment(s) > > logic to hypervisor/paravisor specific code but not generically mark L2 as TDX > > guest but still need to disable TDCALL sort of things. > > That's how it currently works - all the enlightenments are in hypervisor/paravisor > specific code in arch/x86/hyperv and drivers/hv and the vm is not marked with > X86_FEATURE_TDX_GUEST. And I believe there's a reason that the VM is not marked as TDX guest. > > But without X86_FEATURE_TDX_GUEST userspace has no unified way to discover that an > environment is protected by TDX and also the VM gets classified as "AMD SEV" in dmesg. > This is due to CC_ATTR_GUEST_MEM_ENCRYPT being set but X86_FEATURE_TDX_GUEST not. Can you provide more information about what does _userspace_ do here? What's the difference if it sees a TDX guest or a normal non-coco guest in /proc/cpuinfo? Looks the whole purpose of this series is to make userspace happy by advertising TDX guest to /proc/cpuinfo. But if we do that we will have bad side-effect in the kernel so that we need to do things in your patch 2/3. That doesn't seem very convincing. Is there any other way that userspace can utilize, e.g., any HV hypervisor/paravisor specific attributes that are exposed to userspace?