Re: [PATCH v1 1/3] x86/tdx: Check for TDX partitioning during early TDX init

"Huang, Kai" <kai.huang@xxxxxxxxx> · Thu, 7 Dec 2023 12:58:53 +0000

> 
> > I think we are lacking background of this usage model and how it works.  For
> > instance, typically L2 is created by L1, and L1 is responsible for L2's device
> > I/O emulation.  I don't quite understand how could L0 emulate L2's device I/O?
> > 
> > Can you provide more information?
> 
> Let's differentiate between fast and slow I/O. The whole point of the paravisor in
> L1 is to provide device emulation for slow I/O: TPM, RTC, NVRAM, IO-APIC, serial ports.
> 
> But fast I/O is designed to bypass it and go straight to L0. Hyper-V uses paravirtual
> vmbus devices for fast I/O (net/block). The vmbus protocol has awareness of page visibility
> built-in and uses native (GHCI on TDX, GHCB on SNP) mechanisms for notifications. So once
> everything is set up (rings/buffers in swiotlb), the I/O for fast devices does not
> involve L1. This is only possible when the VM manages C-bit itself.

Yeah that makes sense.  Thanks for the info.

> 
> I think the same thing could work for virtio if someone would "enlighten" vring
> notification calls (instead of I/O or MMIO instructions).
> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > Whats missing is the tdx_guest flag is not exposed to userspace in /proc/cpuinfo,
> > > > > and as a result dmesg does not currently display:
> > > > > "Memory Encryption Features active: Intel TDX".
> > > > > 
> > > > > That's what I set out to correct.
> > > > > 
> > > > > > So far I see that you try to get kernel think that it runs as TDX guest,
> > > > > > but not really. This is not very convincing model.
> > > > > > 
> > > > > 
> > > > > No that's not accurate at all. The kernel is running as a TDX guest so I
> > > > > want the kernel to know that. 
> > > > > 
> > > > 
> > > > But it isn't.  It runs on a hypervisor which is a TDX guest, but this doesn't
> > > > make itself a TDX guest.> 
> > > 
> > > That depends on your definition of "TDX guest". The TDX 1.5 TD partitioning spec
> > > talks of TDX-enlightened L1 VMM, (optionally) TDX-enlightened L2 VM and Unmodified
> > > Legacy L2 VM. Here we're dealing with a TDX-enlightened L2 VM.
> > > 
> > > If a guest runs inside an Intel TDX protected TD, is aware of memory encryption and
> > > issues TDVMCALLs - to me that makes it a TDX guest.
> > 
> > The thing I don't quite understand is what enlightenment(s) requires L2 to issue
> > TDVMCALL and know "encryption bit".
> > 
> > The reason that I can think of is:
> > 
> > If device I/O emulation of L2 is done by L0 then I guess it's reasonable to make
> > L2 aware of the "encryption bit" because L0 can only write emulated data to
> > shared buffer.  The shared buffer must be initially converted by the L2 by using
> > MAP_GPA TDVMCALL to L0 (to zap private pages in S-EPT etc), and L2 needs to know
> > the "encryption bit" to set up its page table properly.  L1 must be aware of
> > such private <-> shared conversion too to setup page table properly so L1 must
> > also be notified.
> 
> Your description is correct, except that L2 uses a hypercall (hv_mark_gpa_visibility())
> to notify L1 and L1 issues the MAP_GPA TDVMCALL to L0.

In TDX partitioning IIUC L1 and L2 use different secure-EPT page table when
mapping GPA of L1 and L2.  Therefore IIUC entries of both secure-EPT table which
map to the "to be converted page" need to be zapped.  

I am not entirely sure whether using hv_mark_gpa_visibility() is suffice?  As if
the MAP_GPA was from L1 then I am not sure L0 is easy to zap secure-EPT entry
for L2.

But anyway these are details probably we don't need to consider.

> 
> C-bit awareness is necessary to setup the whole swiotlb pool to be host visible for
> DMA.

Agreed.

> 
> > 
> > The concern I am having is whether there's other usage model(s) that we need to
> > consider.  For instance, running both unmodified L2 and enlightened L2.  Or some
> > L2 only needs TDVMCALL enlightenment but no "encryption bit".
> > 
> 
> Presumably unmodified L2 and enlightened L2 are already covered by current code but
> require excessive trapping to L1.
> 
> I can't see a usecase for TDVMCALLs but no "encryption bit". 
> 
> > In other words, that seems pretty much L1 hypervisor/paravisor implementation
> > specific.  I am wondering whether we can completely hide the enlightenment(s)
> > logic to hypervisor/paravisor specific code but not generically mark L2 as TDX
> > guest but still need to disable TDCALL sort of things.
> 
> That's how it currently works - all the enlightenments are in hypervisor/paravisor
> specific code in arch/x86/hyperv and drivers/hv and the vm is not marked with
> X86_FEATURE_TDX_GUEST.

And I believe there's a reason that the VM is not marked as TDX guest.

> 
> But without X86_FEATURE_TDX_GUEST userspace has no unified way to discover that an
> environment is protected by TDX and also the VM gets classified as "AMD SEV" in dmesg.
> This is due to CC_ATTR_GUEST_MEM_ENCRYPT being set but X86_FEATURE_TDX_GUEST not.

Can you provide more information about what does _userspace_ do here?

What's the difference if it sees a TDX guest or a normal non-coco guest in
/proc/cpuinfo?

Looks the whole purpose of this series is to make userspace happy by advertising
TDX guest to /proc/cpuinfo.  But if we do that we will have bad side-effect in
the kernel so that we need to do things in your patch 2/3.

That doesn't seem very convincing.  Is there any other way that userspace can
utilize, e.g., any HV hypervisor/paravisor specific attributes that are exposed
to userspace?