> From: Jean-Philippe Brucker > Sent: Tuesday, February 13, 2018 8:57 PM > > On 13/02/18 07:54, Tian, Kevin wrote: > >> From: Jean-Philippe Brucker > >> Sent: Tuesday, February 13, 2018 2:33 AM > >> > >> Add bind() and unbind() operations to the IOMMU API. Device drivers > can > >> use them to share process page tables with their devices. bind_group() > >> is provided for VFIO's convenience, as it needs to provide a coherent > >> interface on containers. Other device drivers will most likely want to > >> use bind_device(), which binds a single device in the group. > > > > I saw your bind_group implementation tries to bind the address space > > for all devices within a group, which IMO has some problem. Based on > PCIe > > spec, packet routing on the bus doesn't take PASID into consideration. > > since devices within same group cannot be isolated based on requestor- > ID > > i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple > devices > > could cause undesired p2p. > But so does enabling "classic" DMA... If two devices are not protected by > ACS for example, they are put in the same IOMMU group, and one device > might be able to snoop the other's DMA. VFIO allows userspace to create a > container for them and use MAP/UNMAP, but makes it explicit to the user > that for DMA, these devices are not isolated and must be considered as a > single device (you can't pass them to different VMs or put them in > different containers). So I tried to keep the same idea as MAP/UNMAP for > SVA, performing BIND/UNBIND operations on the VFIO container instead of > the device. there is a small difference. for classic DMA we can reserve PCI BARs when allocating IOVA, thus multiple devices in the same group can still work correctly applied with same translation, if isolation is not cared in between. However for SVA it's CPU virtual addresses managed by kernel mm thus difficult to introduce similar address reservation. Then it's possible for a VA falling into other device's BAR in the same group and cause undesired p2p traffic. In such regard, SVA is actually functionally-broken. > > I kept the analogy simple though, because I don't think there will be many > SVA-capable systems that require IOMMU groups. They will likely I agree that multiple SVA-capable devices in same IOMMU group is not a typical configuration, especially it's usually observed on new devices. Then based on above limitation, I think we could just explicitly avoid enabling SVA in such case. :-) > implement > proper device isolation. Unlike iommu_attach_device(), bind_device() > doesn't call bind_group(), because keeping bonds consistent in groups is > complicated, not worth implementing (drivers can explicitly bind() all > devices that need it) and probably wouldn't ever be used. I also can't > test it. But maybe we could implement the following for now: > > * bind_device() fails if the device's group has more than one device, > otherwise calls __bind_device(). This prevents device drivers that are > oblivious to IOMMU groups from opening a backdoor. > > * bind_group() calls __bind_device() for all devices in group. This way > users that are aware of IOMMU groups can still use them safely. Note that > at the moment bind_group() fails as soon as it finds a device that doesn't > support SVA. Having all devices support SVA in a given group is > unrealistic and this behavior ought to be improved. > > * hotplugging a device into a group still succeeds even if the group > already has mm bonds. Same happens for classic DMA, a hotplugged > device > will have access to all mappings already present in the domain. > > > If my understanding of PCIe spec is correct, probably we should fail > > calling bind_group()/bind_device() when there are multiple devices within > > the given group. If only one device then bind_group is essentially a > wrapper > > to bind_device.>> > >> Regardless of the IOMMU group or domain a device is in, device drivers > >> should call bind() for each device that will use the PASID. > >> > [...] > >> +/** > >> + * iommu_sva_bind_device() - Bind a process address space to a device > >> + * @dev: the device > >> + * @mm: the mm to bind, caller must hold a reference to it > >> + * @pasid: valid address where the PASID will be stored > >> + * @flags: bond properties (IOMMU_SVA_FEAT_*) > >> + * @drvdata: private data passed to the mm exit handler > >> + * > >> + * Create a bond between device and task, allowing the device to access > >> the mm > >> + * using the returned PASID. A subsequent bind() for the same device > and > >> mm will > >> + * reuse the bond (and return the same PASID), but users will have to > call > >> + * unbind() twice. > > > > what's the point of requiring unbind twice? > > Mmh, that was necessary when we kept bond information as domain<- > >mm, but > since it's now device<->mm, we can probably remove the bond refcount. I > consider that a bind() between a given device and mm will always be issued > by the same driver. > > Thanks, > Jean -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html