On Tue, 26 May 2020 15:17:35 -0700 Ashok Raj <ashok.raj@xxxxxxxxx> wrote: > All Intel platforms guarantee that all root complex implementations > must send transactions up to IOMMU for address translations. Hence for > RCiEP devices that are Vendor ID Intel, can claim exception for lack of > ACS support. > > > 3.16 Root-Complex Peer to Peer Considerations > When DMA remapping is enabled, peer-to-peer requests through the > Root-Complex must be handled > as follows: > • The input address in the request is translated (through first-level, > second-level or nested translation) to a host physical address (HPA). > The address decoding for peer addresses must be done only on the > translated HPA. Hardware implementations are free to further limit > peer-to-peer accesses to specific host physical address regions > (or to completely disallow peer-forwarding of translated requests). > • Since address translation changes the contents (address field) of > the PCI Express Transaction Layer Packet (TLP), for PCI Express > peer-to-peer requests with ECRC, the Root-Complex hardware must use > the new ECRC (re-computed with the translated address) if it > decides to forward the TLP as a peer request. > • Root-ports, and multi-function root-complex integrated endpoints, may > support additional peerto-peer control features by supporting PCI Express > Access Control Services (ACS) capability. Refer to ACS capability in > PCI Express specifications for details. > > Since Linux didn't give special treatment to allow this exception, certain > RCiEP MFD devices are getting grouped in a single iommu group. This > doesn't permit a single device to be assigned to a guest for instance. > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > /sys/kernel/iommu_groups/5/devices/0000:00:14.0 > /sys/kernel/iommu_groups/5/devices/0000:00:14.2 > /sys/kernel/iommu_groups/5/devices/0000:00:14.3 > > After the patch: > /sys/kernel/iommu_groups/5/devices/0000:00:14.0 > /sys/kernel/iommu_groups/5/devices/0000:00:14.2 > /sys/kernel/iommu_groups/6/devices/0000:00:14.3 <<< new group > > 14.0 and 14.2 are integrated devices, but legacy end points. > Whereas 14.3 was a PCIe compliant RCiEP. > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > This permits assigning this device to a guest VM. > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx> > To: Joerg Roedel <joro@xxxxxxxxxx> > To: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx > Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx > Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: Darrel Goeddel <DGoeddel@xxxxxxxxxxxxxx> > Cc: Mark Scott <mscott@xxxxxxxxxxxxxx>, > Cc: Romil Sharma <rsharma@xxxxxxxxxxxxxx> > Cc: Ashok Raj <ashok.raj@xxxxxxxxx> > --- > drivers/iommu/iommu.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 2b471419e26c..31b595dfedde 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1187,7 +1187,18 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev, > struct pci_dev *tmp = NULL; > struct iommu_group *group; > > - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > + /* > + * Intel VT-d Specification Section 3.16, Root-Complex Peer to Peer > + * Considerations manadate that all transactions in RCiEP's and > + * even Integrated MFD's *must* be sent up to the IOMMU. P2P is > + * only possible on translated addresses. This gives enough > + * guarantee that such devices can be forgiven for lack of ACS > + * support. > + */ > + if (!pdev->multifunction || > + (pdev->vendor == PCI_VENDOR_ID_INTEL && > + pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) || > + pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > return NULL; > > for_each_pci_dev(tmp) { Hi Ashok, As this is an Intel/VT-d standard, not a PCIe standard, why not implement this in pci_dev_specific_acs_enabled() with all the other quirks? Thanks, Alex