Re: TDISP enablement

Samuel Ortiz <sameo@xxxxxxxxxxxx> · Mon, 13 Nov 2023 16:10:27 +0100

On Mon, Nov 13, 2023 at 05:46:35PM +1100, Alexey Kardashevskiy wrote:
> 
> On 13/11/23 16:43, Samuel Ortiz wrote:
> > Hi Alexey,
> > 
> > On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
> > > Hi everyone,
> > > 
> > > Here is followup after the Dan's community call we had weeks ago.
> > > 
> > > Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to
> > > confidential VMs without trusting the HV and with enabled IDE (encryption)
> > > and IOMMU (performance, compared to current SWIOTLB). I am aware of other
> > > uses and vendors and I spend hours unsuccessfully trying to generalize all
> > > this in a meaningful way.
> > > 
> > > The AMD SEV TIO verbs can be simplified as:
> > > 
> > > - device_connect - starts CMA/SPDM session, returns measurements/certs, runs
> > > IDE_KM to program the keys;
> > > - device_reclaim - undo the connect;
> > > - tdi_bind - transition the TDI to TDISP's LOCKED and RUN states, generates
> > > interface report;
> > 
> >  From a VF to TVM use case, I think tdi_bind should only transition to
> > LOCKED, but not RUN. RUN should only be reached once the TVM approves
> > the device, and afaiu this is a host call.
> 
> What is the point in separating these? What is that thing which requires the
> device to be in LOCKED but not RUN state (besides the obvious
> START_INTERFACE_REQUEST)?

Because they're two very different steps of the TDI assignment into a
TVM.
TDISP moves to RUN upon TVM accepting the TDI into its TCB.
LOCKED is typically driven by the host, in order to lock the TDI
configuration while the TVM verifies, attest and accept or reject it
from its TCB.

When the TSM moves the TDI to RUN, by TVM request, all IO paths (DMA and
MMIO) are supposed to be functional. I understand most architectures
have ways to prevent TDIs from accessing access confidential memory
regardless of their TDISP state, but a TDI in the RUN state should not
be forbidden from DMA'ing the TVM confidential memory. Preventing it
from doing so should be an error case, not the nominal flow.

> > > - tdi_info - read measurements/certs/interface report;
> > > - tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on the
> > > parameters).
> > 
> > That's equivalent to the TVM accepting the TDI, and this should
> > transition the TDI from LOCKED to RUN.
> 
> Even if the device was in RUN, it would not work until the validation is
> done == RMP+IOMMU are updated by the TSM. 

Right, and that makes sense from a security perspective. But a device in
the RUN state will expect IO to work, because it's a TDISP semantic for
it being accepted into the TVM and as such the TVM allowed access to its
confidential memory.

> This may be different for other
> architectures though, dunno. RMP == reverse map table, an SEV SNP thing used
> for verifying memory accesses.
> 
> 
> > > The first 4 called by the host OS, the last two by the TVM ("Trusted VM").
> > > These are implemented in the AMD PSP (platform processor).
> > > There are CMA/SPDM, IDE_KV, TDISP in use.
> > > 
> > > Now, my strawman code does this on the host (I simplified a bit):
> > > - after PCI discovery but before probing: walk through all TDISP-capable
> > > (TEE-IO in PCIe caps) endpoint devices and call device_connect;
> > 
> > Would the host call device_connect unconditionally for all TEE-IO device
> > probed on the host? Wouldn't you want to do so only before the first
> > tdi_bind for a TDI that belongs to the physical device?
> 
> 
> Well, in the SEV TIO, device_connect enables IDE which has value for the
> host on its own.

Ok, that makes sense to me. And the TSM would be responsible for
supporting this. Then TDISP is exercised on a particular TDI for the
device when this TDI is passed through to a specific TVM.

> 
> > > - when drivers probe - it is all set up and the device measurements are
> > > visible to the driver;
> > > - when constructing a TVM, tdi_bind is called;
> > 
> > Here as well, the tdi_bind could be asynchronous to e.g. support hot
> > plugging TDIs into TVMs.
> 
> 
> I do not really see a huge difference between starting a VM with already
> bound TDISP device or hotplugging a device - either way the host calls
> tdi_bind and it does not really care about what the guest is doing at that
> moment and when the guest sees a TDISP device - it is always bound.

I agree. What I meant is that bind can be called at TVM construction
time, or asynchronously whenever the host decides to attach a TDI to the
previously constructed TVM.

> > > and then in the TVM:
> > > - after PCI discovery but before probing: walk through all TDIs (which will
> > > have TEE IO bit set) and call tdi_info, verify the report, if ok - call
> > > tdi_validate;
> > 
> > By verify you mean verify the reported MMIO ranges? With support from
> > the TSM?
> 
> The tdi_validate call to the PSP FW (==TSM) asks the PSP to validate the
> MMIO values and enable them in the RMP.

Sounds good.

> > We discussed that a few times, but the device measurements and
> > attestation report should also be attested, i.e. run against a relying
> > party. The kernel may not be the right place for that, and I'm proposing
> > for the guest kernel to rely on a user space component and offload the
> > attestation part to it. This userspace component would then
> > synchronously return to the guest kernel with an attestation result.
> 
> What bothers me here is that the userspace works when PCI is probed so when
> the userspace is called for attestation - the device is up and running and
> hosting the rootfs.

I guess you're talking about a use case where one would pass a storage
device through, and that device would hold the guest rootfs?
With the approach we're proposing, attestation would be optional and
upon the kernel's decision. In that case, the kernel would not require
userspace to run attestation (because there is no userspace...) but the
actual guest attestation would still happen whenever the guest would
want to fetch an attestation gated secret. And that attestation flow
would include the storage device attestation report, because it's part
of the guest TCB. So, eventually, the device would be attested, but not
right when the device is attached to the guest.

> The userspace will need a knob which transitions the
> device into the trusted state (switch SWIOTLB to direct DMA, for example). I
> guess if the userspace is initramdisk, it could still reload the driver
> which is not doing useful work just yet...
> 
> 
> > > - when drivers probe - it is all set up and the driver decides if/which DMA
> > > mode to use (SWIOTLB or direct), or panic().
> > > 
> > 
> > When would it panic?
> 
> When attestation failed.

Attestation failure should only trigger a rejection from the TVM, i.e.
the TDI would not be probed. That should be reported back to the host,
who may decide to call unbind on that TDI (and thus moved it back to
UNLOCKED).

> > > Uff. Too long already. Sorry. Now, go to the problems:
> > > 
> > > If the user wants only CMA/SPDM,
> > 
> > By user here, you mean the user controlling the host? Or the TVM
> > user/owner? I assume the former.
> 
> Yes, the physical host owner.
> 
> > > the Lukas'es patched will do that without
> > > the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
> > > sessions).
> > > 
> > > If the user wants only IDE, the AMD PSP's device_connect needs to be called
> > > and the host OS does not get to know the IDE keys. Other vendors allow
> > > programming IDE keys to the RC on the baremetal, and this also may co-exist
> > > with a TSM running outside of Linux - the host still manages trafic classes
> > > and streams.
> > > 
> > > If the user wants TDISP for VMs, this assumes the user does not trust the
> > > host OS and therefore the TSM (which is trusted) has to do CMA/SPDM and IDE.
> > > 
> > > The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE
> > > seem capable of co-existing, TDISP does not.
> > 
> > Which makes sense, TDISP is not designed to be used outside of the
> > TEE-IO VFs assigned to TVM use case.
> > 
> > > 
> > > However there are common bits.
> > > - certificates/measurements/reports blobs: storing, presenting to the
> > > userspace (results of device_connect and tdi_bind);
> > > - place where we want to authenticate the device and enable IDE
> > > (device_connect);
> > > - place where we want to bind TDI to a TVM (tdi_bind).
> > > 
> > > I've tried to address this with my (poorly named) drivers/pci/pcie/tdisp.ko
> > > and a hack for VFIO PCI device to call tdi_bind.
> > > 
> > > The next steps:
> > > - expose blobs via configfs (like Dan did configfs-tsm);
> > > - s/tdisp.ko/coco.ko/;
> > > - ask the audience - what is missing to make it reusable for other vendors
> > > and uses?
> > 
> > The connect-bind-run flow is similar to the one we have defined for
> > RISC-V [1]. There we are defining the TEE-IO flows for RISC-V in
> > details, but nothing there is architectural and could somehow apply to
> > other architectures.
> 
> Yeah, it is good one!

Thanks. Comments and improvements proposal are welcome.

> I am still missing the need to have sbi_covg_start_interface() as a separate
> step though. Thanks,

Just to reiterate: start_interface is a guest call into the TSM, to let
it know that it accepts the TDI. That makes the TSM do two things:

1. Enable the MMIO and DMA mappings.
2. Move the TDI to RUN.

After that call, the TDI is usable from a TVM perspective. Before that
call it is not, but its configuration and state are locked.

Cheers,
Samuel.