Re: [RFC PATCH v2 14/22] iommufd: Add TIO calls

Xu Yilun <yilun.xu@xxxxxxxxxxxxxxx> · Wed, 26 Feb 2025 18:49:18 +0800

On Wed, Feb 26, 2025 at 11:12:32AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 25/2/25 20:00, Xu Yilun wrote:
> > On Tue, Feb 18, 2025 at 10:10:01PM +1100, Alexey Kardashevskiy wrote:
> > > When a TDISP-capable device is passed through, it is configured as
> > > a shared device to begin with. Later on when a VM probes the device,
> > > detects its TDISP capability (reported via the PCIe ExtCap bit
> > > called "TEE-IO"), performs the device attestation and transitions it
> > > to a secure state when the device can run encrypted DMA and respond
> > > to encrypted MMIO accesses.
> > > 
> > > Since KVM is out of the TCB, secure enablement is done in the secure
> > > firmware. The API requires PCI host/guest BDFns, a KVM id hence such
> > > calls are routed via IOMMUFD, primarily because allowing secure DMA
> > > is the major performance bottleneck and it is a function of IOMMU.
> > 
> > I still have concern about the vdevice interface for bind. Bind put the
> > device to LOCKED state, so is more of a device configuration rather
> > than an iommu configuration. So seems more reasonable put the API in VFIO?
> 
> IOMMUFD means pretty much VFIO (in the same way "VFIO means KVM" as 95+% of
> VFIO users use it from KVM, although VFIO works fine without KVM) so not
> much difference where to put this API and can be done either way. VFIO is

Er... I cannot agree. There are clear responsibilities for
VFIO/IOMMUFD/KVM each. They don't overlap each other. So I don't think
either way is OK. VFIO still controls the overall device behavior
and it is VFIO's decision to hand over user DMA setup to IOMMUFD. IIUC
that's why VFIO_DEVICE_ATTACH_IOMMUFD_PT should be a VFIO API.

E.g. I don't think VFIO driver would expect its MMIO access suddenly
failed without knowing what happened.

> reasonable, the immediate problem is that IOMMUFD's vIOMMU knows the guest
> BDFn (well, for AMD) and VFIO PCI does not.

For Intel, it is host BDF. But I think this is TSM architecture
difference that could be hidden in TSM framework. From TSM caller's POV,
it could just be a magic number identifying the TDI.

Back to your concern, I don't think it is a problem. From your patch,
vIOMMU doesn't know the guest BDFn by nature, it is just the user
stores the id in vdevice via iommufd_vdevice_alloc_ioctl(). A proper
VFIO API could also do this work.

I'm suggesting a VFIO API:

/*
 * @tdi_id: A TSM recognizable TDI identifier
 *	    On input, user suggests the TDI identifier number for TSM.
 *	    On output, TSM's decision of the TDI identifier number.
 */
struct vfio_pci_tsm_bind {
	__u32 argsz;
	__u32 flags;
	__u32 tdi_id;
	__u32 pad;
};

#define VFIO_DEVICE_TSM_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)

I need the tdi_id as output cause I don't want any outside TSM user and
Guest to assume what the TDI id should be.

static int vfio_pci_ioctl_tsm_bind(struct vfio_pci_core_device *vdev,
				   void __user *arg)
{
	unsigned long minsz = offsetofend(struct vfio_pci_tsm_bind, tdi_id);
	struct pci_dev *pdev = vdev->pdev;
	struct kvm *kvm = vdev->vdev.kvm;
	struct vfio_pci_tsm_bind bind;

	if (copy_from_user(&bind, arg, minsz))
		return -EFAULT;

	ret = pci_tsm_dev_bind(pdev, kvm, &bind.tdi_id);

}

A call to TSM makes TSM driver know the tdi_id and could find the real
device inside TSM via tdi_id. Following TSM call could directly use
tdi_id as parameter.

The implementation is basically no difference from:

+       vdev = container_of(iommufd_get_object(ucmd->ictx, cmd->vdevice_id,
+                                              IOMMUFD_OBJ_VDEVICE),

The real concern is the device owner, VFIO, should initiate the bind.

> 
> 
> > > Add TDI bind to do the initial binding of a passed through PCI
> > > function to a VM. Add a forwarder for TIO GUEST REQUEST. These two
> > > call into the TSM which forwards the calls to the PSP.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxx>
> > > ---
> > > 
> > > Both enabling secure DMA (== "SDTE Write") and secure MMIO (== "MMIO
> > > validate") are TIO GUEST REQUEST messages. These are encrypted and
> > > the HV (==IOMMUFD or KVM or VFIO) cannot see them unless the guest
> > > shares some via kvm_run::kvm_user_vmgexit (and then QEMU passes those
> > > via ioctls).
> > > 
> > > This RFC routes all TIO GUEST REQUESTs via IOMMUFD which arguably should
> > > only do so only for "SDTE Write" and leave "MMIO validate" for VFIO.
> > 
> > The fact is HV cannot see the guest requests, even I think HV never have
> > to care about the guest requests. HV cares until bind, then no HV side
> > MMIO & DMA access is possible, any operation/state after bind won't
> > affect HV more. And HV could always unbind to rollback guest side thing.
> > 
> > That said guest requests are nothing to do with any host side component,
> > iommu or vfio. It is just the message posting between VM & firmware. I
> > suppose KVM could directly do it by calling TSM driver API.
> 
> No, it could not as the HV needs to add the host BDFn to the guest's request
> before calling the firmware and KVM does not have that knowledge.

I think if TSM has knowledge about tdi_id, KVM doesn't have to know host BDFn.
Just let TSM handle the vendor difference. Not sure if this solves all
the problem.

> 
> These guest requests are only partly encrypted as the guest needs
> cooperation from the HV. The guest BDFn comes unencrypted from the VM to let
> the HV find the host BDFn and do the bind.

It is not about HV never touch any message content. It is about HV
doesn't (and shouldn't, since some info is encrypted) influence any host
behavior by executing guest request, so no need to route to any other
component.

Thanks,
Yilun

> 
> Also, say, in order to enable MMIO range, the host needs to "rmpupdate"
> MMIOs first (and then the firmware does "pvalidate") so it needs to know the
> range which is in unencrypted part of guest request.
> 
> Here is a rough idea: https://github.com/aik/qemu/commit/f804b65aff5b
> 
> A TIO Guest request is made of:
> - guest page with unencrypted header (msg type is essential) and encrypted
> body for consumption by the firmware;
> - a couple of 64bit bit fields and RAX/RBX/... in shared GHCB page.
> 
> Thanks,
> 
> > Thanks,
> > Yilun
> 
> -- 
> Alexey
>