On 13/11/23 16:43, Samuel Ortiz wrote:
Hi Alexey,
On Wed, Nov 01, 2023 at 09:56:11AM +1100, Alexey Kardashevskiy wrote:
Hi everyone,
Here is followup after the Dan's community call we had weeks ago.
Our (AMD) goal at the moment is TDISP to pass through SRIOV VFs to
confidential VMs without trusting the HV and with enabled IDE (encryption)
and IOMMU (performance, compared to current SWIOTLB). I am aware of other
uses and vendors and I spend hours unsuccessfully trying to generalize all
this in a meaningful way.
The AMD SEV TIO verbs can be simplified as:
- device_connect - starts CMA/SPDM session, returns measurements/certs, runs
IDE_KM to program the keys;
- device_reclaim - undo the connect;
- tdi_bind - transition the TDI to TDISP's LOCKED and RUN states, generates
interface report;
From a VF to TVM use case, I think tdi_bind should only transition to
LOCKED, but not RUN. RUN should only be reached once the TVM approves
the device, and afaiu this is a host call.
What is the point in separating these? What is that thing which requires
the device to be in LOCKED but not RUN state (besides the obvious
START_INTERFACE_REQUEST)?
- tdi_unbind - undo the bind;
- tdi_info - read measurements/certs/interface report;
- tdi_validate - unlock TDI's MMIO and IOMMU (or invalidate, depends on the
parameters).
That's equivalent to the TVM accepting the TDI, and this should
transition the TDI from LOCKED to RUN.
Even if the device was in RUN, it would not work until the validation is
done == RMP+IOMMU are updated by the TSM. This may be different for
other architectures though, dunno. RMP == reverse map table, an SEV SNP
thing used for verifying memory accesses.
The first 4 called by the host OS, the last two by the TVM ("Trusted VM").
These are implemented in the AMD PSP (platform processor).
There are CMA/SPDM, IDE_KV, TDISP in use.
Now, my strawman code does this on the host (I simplified a bit):
- after PCI discovery but before probing: walk through all TDISP-capable
(TEE-IO in PCIe caps) endpoint devices and call device_connect;
Would the host call device_connect unconditionally for all TEE-IO device
probed on the host? Wouldn't you want to do so only before the first
tdi_bind for a TDI that belongs to the physical device?
Well, in the SEV TIO, device_connect enables IDE which has value for the
host on its own.
- when drivers probe - it is all set up and the device measurements are
visible to the driver;
- when constructing a TVM, tdi_bind is called;
Here as well, the tdi_bind could be asynchronous to e.g. support hot
plugging TDIs into TVMs.
I do not really see a huge difference between starting a VM with already
bound TDISP device or hotplugging a device - either way the host calls
tdi_bind and it does not really care about what the guest is doing at
that moment and when the guest sees a TDISP device - it is always bound.
and then in the TVM:
- after PCI discovery but before probing: walk through all TDIs (which will
have TEE IO bit set) and call tdi_info, verify the report, if ok - call
tdi_validate;
By verify you mean verify the reported MMIO ranges? With support from
the TSM?
The tdi_validate call to the PSP FW (==TSM) asks the PSP to validate the
MMIO values and enable them in the RMP.
We discussed that a few times, but the device measurements and
attestation report should also be attested, i.e. run against a relying
party. The kernel may not be the right place for that, and I'm proposing
for the guest kernel to rely on a user space component and offload the
attestation part to it. This userspace component would then
synchronously return to the guest kernel with an attestation result.
What bothers me here is that the userspace works when PCI is probed so
when the userspace is called for attestation - the device is up and
running and hosting the rootfs. The userspace will need a knob which
transitions the device into the trusted state (switch SWIOTLB to direct
DMA, for example). I guess if the userspace is initramdisk, it could
still reload the driver which is not doing useful work just yet...
- when drivers probe - it is all set up and the driver decides if/which DMA
mode to use (SWIOTLB or direct), or panic().
When would it panic?
When attestation failed.
Uff. Too long already. Sorry. Now, go to the problems:
If the user wants only CMA/SPDM,
By user here, you mean the user controlling the host? Or the TVM
user/owner? I assume the former.
Yes, the physical host owner.
the Lukas'es patched will do that without
the PSP. This may co-exist with the AMD PSP (if the endpoint allows multiple
sessions).
If the user wants only IDE, the AMD PSP's device_connect needs to be called
and the host OS does not get to know the IDE keys. Other vendors allow
programming IDE keys to the RC on the baremetal, and this also may co-exist
with a TSM running outside of Linux - the host still manages trafic classes
and streams.
If the user wants TDISP for VMs, this assumes the user does not trust the
host OS and therefore the TSM (which is trusted) has to do CMA/SPDM and IDE.
The TSM code is not Linux and not shared among vendors. CMA/SPDM and IDE
seem capable of co-existing, TDISP does not.
Which makes sense, TDISP is not designed to be used outside of the
TEE-IO VFs assigned to TVM use case.
However there are common bits.
- certificates/measurements/reports blobs: storing, presenting to the
userspace (results of device_connect and tdi_bind);
- place where we want to authenticate the device and enable IDE
(device_connect);
- place where we want to bind TDI to a TVM (tdi_bind).
I've tried to address this with my (poorly named) drivers/pci/pcie/tdisp.ko
and a hack for VFIO PCI device to call tdi_bind.
The next steps:
- expose blobs via configfs (like Dan did configfs-tsm);
- s/tdisp.ko/coco.ko/;
- ask the audience - what is missing to make it reusable for other vendors
and uses?
The connect-bind-run flow is similar to the one we have defined for
RISC-V [1]. There we are defining the TEE-IO flows for RISC-V in
details, but nothing there is architectural and could somehow apply to
other architectures.
Yeah, it is good one!
I am still missing the need to have sbi_covg_start_interface() as a
separate step though. Thanks,
Cheers,
Samuel.
[1] https://github.com/riscv-non-isa/riscv-ap-tee-io/blob/main/specification/07-theory_operations.adoc
--
Alexey