Re: IOMMU groups ... PEX8606 switch?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2013-12-30 at 16:13 -0800, Dana Goyette wrote:
> On 12/29/2013 08:16 PM, Alex Williamson wrote:
> > On Sat, 2013-12-28 at 23:32 -0800, Dana Goyette wrote:
> >> On 12/28/2013 7:23 PM, Alex Williamson wrote:
> >>> On Sat, 2013-12-28 at 18:31 -0800, Dana Goyette wrote:
> >>>> I have purchased both a SuperMicro X10SAE and an X10SAT, and I need to
> >>>> soon decide which one to keep.
> >>>>
> >>>> The SuperMicro X10SAT has all the PCIe x1 slots hidden behind a PLX
> >>>> PEX8066 switch, which claims to support ACS.  I'd expect the devices
> >>>> downstream of the PLX switch to be in separate groups.
> >>>>
> >>>> With Linux 3.13-rc5 and "enable overrides for missing ACS capabilities"
> >>>> applied and set for the Intel root ports, the devices behind the switch
> >>>> remain stuck in the same group.
> >>>>
> >>>> In terms of passing devices to different VMs, which is better: all
> >>>> devices on different root ports, or all devices behind the one
> >>>> ACS-supporting switch?
> >>>
> >>> Can you provide lspci -vvv info?  If you're getting that for groups
> >>> either the switch has ACS capabilities, but doesn't support the features
> >>> we need or we're doing something wrong.  Thanks,
> >>>
> >> I initially tried attaching the output as a .txt file, but it's too
> >> large.  Anyway, here's the output of lspci -nnvvv (you may notice that I
> >> moved the Radeon to a different slot).
> >
> > Well, something seems amiss since the downstream switch ports all seem
> > to support and enable the correct set of ACS capabilities.  I'm tending
> > to suspect something wrong with the ACS override patch or how it's being
> > used since your IOMMU group is still based at the root port.  Each root
> > port is isolated from the other root ports though, so something is
> > happening with the override patch.  Can you provide the kernel command
> > line you use to enable ACS overrides and the override patch you're
> > using, as it applies to 3.13-rc5?  Thanks,
> >
> > Alex
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> I'm using the original acs-override patch from this post:
> https://lkml.org/lkml/2013/5/30/513
> 
> Kernel parameter is:
> pcie_acs_override=id:8086:8c10,id:8086:8c12,id:8086:8c16,id:8086:8c18
> 
> When booting a kernel without the override patch, the following devices 
> are all in the same group: Intel Root Ports 1, 2, 4, 5; ASMedia SATA 
> controller; PLX PEX8606 switch; Renesas USB controller; TI Firewire 
> controller; Intel I210 Ethernet controller.

Could you please try the patch below and send dmesg for the system once
booted.  This applies directly to upstream and includes the acs override
patch.  Thanks,

Alex

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b9e9bd8..2481c6c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2473,6 +2473,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 		nomsi	Do not use MSI for native PCIe PME signaling (this makes
 			all PCIe root ports use INTx for all services).
 
+	pcie_acs_override =
+			[PCIE] Override missing PCIe ACS support for:
+		downstream
+			All downstream ports - full ACS capabilties
+		multifunction
+			All multifunction devices - multifunction ACS subset
+		id:nnnn:nnnn
+			Specfic device - full ACS capabilities
+			Specified as vid:did (vendor/device ID) in hex
+
 	pcmv=		[HW,PCMCIA] BadgePAD 4
 
 	pd.		[PARIDE]
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 43b9bfe..b31961d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4166,8 +4166,10 @@ static int intel_iommu_add_device(struct device *dev)
 			     pdev->bus->number, pdev->devfn))
 		return -ENODEV;
 
+	pr_info("%s(%s)\n", __func__, pci_name(pdev));
 	bridge = pci_find_upstream_pcie_bridge(pdev);
 	if (bridge) {
+		pr_info("Upstream bridge %s\n", pci_name(bridge));
 		if (pci_is_pcie(bridge))
 			dma_pdev = pci_get_domain_bus_and_slot(
 						pci_domain_nr(pdev->bus),
@@ -4177,9 +4179,13 @@ static int intel_iommu_add_device(struct device *dev)
 	} else
 		dma_pdev = pci_dev_get(pdev);
 
+	pr_info("dma_pdev #1: %s\n", pci_name(dma_pdev));
+
 	/* Account for quirked devices */
 	swap_pci_ref(&dma_pdev, pci_get_dma_source(dma_pdev));
 
+	pr_info("dma_pdev #2: %s\n", pci_name(dma_pdev));
+
 	/*
 	 * If it's a multifunction device that does not support our
 	 * required ACS flags, add to the same group as lowest numbered
@@ -4204,6 +4210,8 @@ static int intel_iommu_add_device(struct device *dev)
 		}
 	}
 
+	pr_info("dma_pdev #3: %s\n", pci_name(dma_pdev));
+
 	/*
 	 * Devices on the root bus go through the iommu.  If that's not us,
 	 * find the next upstream device and test ACS up to the root bus.
@@ -4225,6 +4233,8 @@ static int intel_iommu_add_device(struct device *dev)
 		swap_pci_ref(&dma_pdev, pci_dev_get(bus->self));
 	}
 
+	pr_info("dma_pdev #4: %s\n", pci_name(dma_pdev));
+
 root_bus:
 	group = iommu_group_get(&dma_pdev->dev);
 	pci_dev_put(dma_pdev);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 07369f3..a6be83a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2395,10 +2395,15 @@ static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
 {
 	int pos;
 	u16 cap, ctrl;
+	bool ret;
 
 	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
-	if (!pos)
-		return false;
+	if (!pos) {
+		pr_info("%s no ACS capability on %s\n",
+			__func__, pci_name(pdev));
+		ret = false;
+		goto out;
+	}
 
 	/*
 	 * Except for egress control, capabilities are either required
@@ -2409,7 +2414,11 @@ static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
 	acs_flags &= (cap | PCI_ACS_EC);
 
 	pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
-	return (ctrl & acs_flags) == acs_flags;
+	ret = (ctrl & acs_flags) == acs_flags;
+out:
+	pr_info("%s(%s, %04x) -> %s\n", __func__, pci_name(pdev), acs_flags,
+		ret ? "true" : "false");
+	return ret;
 }
 
 /**
@@ -2432,17 +2441,24 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 {
 	int ret;
 
+	pr_info("%s(%s, %04x)\n", __func__, pci_name(pdev), acs_flags);
 	ret = pci_dev_specific_acs_enabled(pdev, acs_flags);
-	if (ret >= 0)
+	if (ret >= 0) {
+		pr_info("-> %s\n", ret > 0 ? "true" : "false");
 		return ret > 0;
+	}
 
 	/*
 	 * Conventional PCI and PCI-X devices never support ACS, either
 	 * effectively or actually.  The shared bus topology implies that
 	 * any device on the bus can receive or snoop DMA.
 	 */
-	if (!pci_is_pcie(pdev))
+	if (!pci_is_pcie(pdev)) {
+		pr_info("-> false\n");
 		return false;
+	}
+
+	ret = true;
 
 	switch (pci_pcie_type(pdev)) {
 	/*
@@ -2459,7 +2475,8 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 	 */
 	case PCI_EXP_TYPE_PCI_BRIDGE:
 	case PCI_EXP_TYPE_RC_EC:
-		return false;
+		ret = false;
+		break;
 	/*
 	 * PCIe 3.0, 6.12.1.1 specifies that downstream and root ports should
 	 * implement ACS in order to indicate their peer-to-peer capabilities,
@@ -2467,7 +2484,8 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 	 */
 	case PCI_EXP_TYPE_DOWNSTREAM:
 	case PCI_EXP_TYPE_ROOT_PORT:
-		return pci_acs_flags_enabled(pdev, acs_flags);
+		ret = pci_acs_flags_enabled(pdev, acs_flags);
+		break;
 	/*
 	 * PCIe 3.0, 6.12.1.2 specifies ACS capabilities that should be
 	 * implemented by the remaining PCIe types to indicate peer-to-peer
@@ -2482,14 +2500,15 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 		if (!pdev->multifunction)
 			break;
 
-		return pci_acs_flags_enabled(pdev, acs_flags);
+		ret = pci_acs_flags_enabled(pdev, acs_flags);
 	}
 
 	/*
 	 * PCIe 3.0, 6.12.1.3 specifies no ACS capabilities are applicable
 	 * to single function devices with the exception of downstream ports.
 	 */
-	return true;
+	pr_info("-> %s\n", ret ? "true" : "false");
+	return ret;
 }
 
 /**
@@ -2505,20 +2524,28 @@ bool pci_acs_path_enabled(struct pci_dev *start,
 			  struct pci_dev *end, u16 acs_flags)
 {
 	struct pci_dev *pdev, *parent = start;
+	bool ret = true;
 
+	pr_info("%s(%s, %s, %04x)\n", __func__, pci_name(start),
+		end ? pci_name(end) : "NULL", acs_flags);
 	do {
 		pdev = parent;
 
-		if (!pci_acs_enabled(pdev, acs_flags))
-			return false;
+		if (!pci_acs_enabled(pdev, acs_flags)) {
+			ret = false;
+			break;
+		}
 
-		if (pci_is_root_bus(pdev->bus))
-			return (end == NULL);
+		if (pci_is_root_bus(pdev->bus)) {
+			ret = (end == NULL);
+			break;
+		}
 
 		parent = pdev->bus->self;
 	} while (pdev != end);
 
-	return true;
+	pr_info("-> %s\n", ret ? "true" : "false");
+	return ret;
 }
 
 /**
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 3a02717..1e0ec58 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3423,11 +3423,130 @@ static int pci_quirk_amd_sb_acs(struct pci_dev *dev, u16 acs_flags)
 #endif
 }
 
+static bool acs_on_downstream;
+static bool acs_on_multifunction;
+
+#define NUM_ACS_IDS 16
+struct acs_on_id {
+	unsigned short vendor;
+	unsigned short device;
+};
+static struct acs_on_id acs_on_ids[NUM_ACS_IDS];
+static u8 max_acs_id;
+
+static __init int pcie_acs_override_setup(char *p)
+{
+	if (!p)
+		return -EINVAL;
+
+	while (*p) {
+		if (!strncmp(p, "downstream", 10)) {
+			acs_on_downstream = true;
+			pr_info("PCIe ACS bypass added for downstream\n");
+		}
+		if (!strncmp(p, "multifunction", 13)) {
+			acs_on_multifunction = true;
+			pr_info("PCIe ACS bypass added for multifunction\n");
+		}
+		if (!strncmp(p, "id:", 3)) {
+			char opt[5];
+			int ret;
+			long val;
+
+			if (max_acs_id >= NUM_ACS_IDS - 1) {
+				pr_warn("Out of PCIe ACS override slots (%d)\n",
+					NUM_ACS_IDS);
+				goto next;
+			}
+
+			p += 3;
+			snprintf(opt, 5, "%s", p);
+			ret = kstrtol(opt, 16, &val);
+			if (ret) {
+				pr_warn("PCIe ACS ID parse error %d\n", ret);
+				goto next;
+			}
+			acs_on_ids[max_acs_id].vendor = val;
+
+			p += strcspn(p, ":");
+			if (*p != ':') {
+				pr_warn("PCIe ACS invalid ID\n");
+				goto next;
+			}
+
+			p++;
+			snprintf(opt, 5, "%s", p);
+			ret = kstrtol(opt, 16, &val);
+			if (ret) {
+				pr_warn("PCIe ACS ID parse error %d\n", ret);
+				goto next;
+			}
+			acs_on_ids[max_acs_id].device = val;
+			pr_info("PCIe ACS bypass added for %04x:%04x\n",
+				acs_on_ids[max_acs_id].vendor,
+				acs_on_ids[max_acs_id].device);
+			max_acs_id++;
+		}
+next:
+		p += strcspn(p, ",");
+		if (*p == ',')
+			p++;
+	}
+
+	if (acs_on_downstream || acs_on_multifunction || max_acs_id)
+		pr_warn("Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA\n");
+
+	return 0;
+}
+early_param("pcie_acs_override", pcie_acs_override_setup);
+
+static int pcie_acs_overrides(struct pci_dev *dev, u16 acs_flags)
+{
+	int i;
+
+	/* Never override ACS for legacy devices or devices with ACS caps */
+	if (!pci_is_pcie(dev) /* ||
+	    pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS) */)
+		return -ENOTTY;
+
+	for (i = 0; i < max_acs_id; i++) {
+		if (acs_on_ids[i].vendor == dev->vendor &&
+		    acs_on_ids[i].device == dev->device) {
+			pr_info("PCIe ACS override match for %s\n",
+				 pci_name(dev));
+			return 1;
+		}
+	}
+
+	switch (pci_pcie_type(dev)) {
+	case PCI_EXP_TYPE_DOWNSTREAM:
+	case PCI_EXP_TYPE_ROOT_PORT:
+		if (acs_on_downstream) {
+			pr_info("PCIe ACS override downstream %s\n",
+				 pci_name(dev));
+			return 1;
+		}
+		break;
+	case PCI_EXP_TYPE_ENDPOINT:
+	case PCI_EXP_TYPE_UPSTREAM:
+	case PCI_EXP_TYPE_LEG_END:
+	case PCI_EXP_TYPE_RC_END:
+		if (acs_on_multifunction && dev->multifunction) {
+			pr_info("PCIe ACS override multifunction %s\n",
+				 pci_name(dev));
+			return 1;
+		}
+	}
+
+	return -ENOTTY;
+}
+
 static const struct pci_dev_acs_enabled {
 	u16 vendor;
 	u16 device;
 	int (*acs_enabled)(struct pci_dev *dev, u16 acs_flags);
 } pci_dev_acs_enabled[] = {
+	{ PCI_ANY_ID, PCI_ANY_ID, pcie_acs_overrides },
 	{ PCI_VENDOR_ID_ATI, 0x4385, pci_quirk_amd_sb_acs },
 	{ PCI_VENDOR_ID_ATI, 0x439c, pci_quirk_amd_sb_acs },
 	{ PCI_VENDOR_ID_ATI, 0x4383, pci_quirk_amd_sb_acs },


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux