On 13/02/18 23:34, Tian, Kevin wrote: >> From: Jean-Philippe Brucker >> Sent: Tuesday, February 13, 2018 8:57 PM >> >> On 13/02/18 07:54, Tian, Kevin wrote: >>>> From: Jean-Philippe Brucker >>>> Sent: Tuesday, February 13, 2018 2:33 AM >>>> >>>> Add bind() and unbind() operations to the IOMMU API. Device drivers >> can >>>> use them to share process page tables with their devices. bind_group() >>>> is provided for VFIO's convenience, as it needs to provide a coherent >>>> interface on containers. Other device drivers will most likely want to >>>> use bind_device(), which binds a single device in the group. >>> >>> I saw your bind_group implementation tries to bind the address space >>> for all devices within a group, which IMO has some problem. Based on >> PCIe >>> spec, packet routing on the bus doesn't take PASID into consideration. >>> since devices within same group cannot be isolated based on requestor- >> ID >>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple >> devices >>> could cause undesired p2p. >> But so does enabling "classic" DMA... If two devices are not protected by >> ACS for example, they are put in the same IOMMU group, and one device >> might be able to snoop the other's DMA. VFIO allows userspace to create a >> container for them and use MAP/UNMAP, but makes it explicit to the user >> that for DMA, these devices are not isolated and must be considered as a >> single device (you can't pass them to different VMs or put them in >> different containers). So I tried to keep the same idea as MAP/UNMAP for >> SVA, performing BIND/UNBIND operations on the VFIO container instead of >> the device. > > there is a small difference. for classic DMA we can reserve PCI BARs > when allocating IOVA, thus multiple devices in the same group can > still work correctly applied with same translation, if isolation is not > cared in between. However for SVA it's CPU virtual addresses > managed by kernel mm thus difficult to introduce similar address > reservation. Then it's possible for a VA falling into other device's > BAR in the same group and cause undesired p2p traffic. In such > regard, SVA is actually functionally-broken. I think the problem exists even if there is a single device in the group. If for example, malloc() returns a VA that corresponds to a PCI host bridge in IOVA space, performing DMA on that buffer won't reach the IOMMU and will cause undesirable side-effects. My series doesn't address the problem, but I believe we should carve reserved regions out of the process address space during bind(), for example by creating a PROT_NONE vma preventing userspace from obtaining that VA. If you solve this problem, you also solve it for multiple devices in a group, because the IOMMU core provides the resv API on groups... That's until you hotplug a device into a live group (currently WARN in VFIO), with different resv regions. >> I kept the analogy simple though, because I don't think there will be many >> SVA-capable systems that require IOMMU groups. They will likely > > I agree that multiple SVA-capable devices in same IOMMU group is not > a typical configuration, especially it's usually observed on new devices. > Then based on above limitation, I think we could just explicitly avoid > enabling SVA in such case. :-) I'd certainly like that :) Thanks, Jean