Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices

Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> · Tue, 13 Feb 2018 12:57:23 +0000

On 13/02/18 07:54, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Add bind() and unbind() operations to the IOMMU API. Device drivers can
>> use them to share process page tables with their devices. bind_group()
>> is provided for VFIO's convenience, as it needs to provide a coherent
>> interface on containers. Other device drivers will most likely want to
>> use bind_device(), which binds a single device in the group.
> 
> I saw your bind_group implementation tries to bind the address space
> for all devices within a group, which IMO has some problem. Based on PCIe
> spec, packet routing on the bus doesn't take PASID into consideration. 
> since devices within same group cannot be isolated based on requestor-ID
> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
> could cause undesired p2p.
But so does enabling "classic" DMA... If two devices are not protected by
ACS for example, they are put in the same IOMMU group, and one device
might be able to snoop the other's DMA. VFIO allows userspace to create a
container for them and use MAP/UNMAP, but makes it explicit to the user
that for DMA, these devices are not isolated and must be considered as a
single device (you can't pass them to different VMs or put them in
different containers). So I tried to keep the same idea as MAP/UNMAP for
SVA, performing BIND/UNBIND operations on the VFIO container instead of
the device.

I kept the analogy simple though, because I don't think there will be many
SVA-capable systems that require IOMMU groups. They will likely implement
proper device isolation. Unlike iommu_attach_device(), bind_device()
doesn't call bind_group(), because keeping bonds consistent in groups is
complicated, not worth implementing (drivers can explicitly bind() all
devices that need it) and probably wouldn't ever be used. I also can't
test it. But maybe we could implement the following for now:

* bind_device() fails if the device's group has more than one device,
otherwise calls __bind_device(). This prevents device drivers that are
oblivious to IOMMU groups from opening a backdoor.

* bind_group() calls __bind_device() for all devices in group. This way
users that are aware of IOMMU groups can still use them safely. Note that
at the moment bind_group() fails as soon as it finds a device that doesn't
support SVA. Having all devices support SVA in a given group is
unrealistic and this behavior ought to be improved.

* hotplugging a device into a group still succeeds even if the group
already has mm bonds. Same happens for classic DMA, a hotplugged device
will have access to all mappings already present in the domain.

> If my understanding of PCIe spec is correct, probably we should fail 
> calling bind_group()/bind_device() when there are multiple devices within 
> the given group. If only one device then bind_group is essentially a wrapper
> to bind_device.>>
>> Regardless of the IOMMU group or domain a device is in, device drivers
>> should call bind() for each device that will use the PASID.
>>
[...]
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to it
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
>> + * @drvdata: private data passed to the mm exit handler
>> + *
>> + * Create a bond between device and task, allowing the device to access
>> the mm
>> + * using the returned PASID. A subsequent bind() for the same device and
>> mm will
>> + * reuse the bond (and return the same PASID), but users will have to call
>> + * unbind() twice.
> 
> what's the point of requiring unbind twice?

Mmh, that was necessary when we kept bond information as domain<->mm, but
since it's now device<->mm, we can probably remove the bond refcount. I
consider that a bind() between a given device and mm will always be issued
by the same driver.

Thanks,
Jean
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html