Re: [PATCH V4] PCI: Extend ACS configurability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25. 06. 24, 17:31, Vidya Sagar wrote:
PCIe ACS settings control the level of isolation and the possible P2P
paths between devices. With greater isolation the kernel will create
smaller iommu_groups and with less isolation there is more HW that
can achieve P2P transfers. From a virtualization perspective all
devices in the same iommu_group must be assigned to the same VM as
they lack security isolation.

There is no way for the kernel to automatically know the correct
ACS settings for any given system and workload. Existing command line
options (ex:- disable_acs_redir) allow only for large scale change,
disabling all isolation, but this is not sufficient for more complex cases.

Add a kernel command-line option 'config_acs' to directly control all the
ACS bits for specific devices, which allows the operator to setup the
right level of isolation to achieve the desired P2P configuration.
The definition is future proof, when new ACS bits are added to the spec
the open syntax can be extended.

ACS needs to be setup early in the kernel boot as the ACS settings
effect how iommu_groups are formed. iommu_group formation is a one
time event during initial device discovery, changing ACS bits after
kernel boot can result in an inaccurate view of the iommu_groups
compared to the current isolation configuration.

ACS applies to PCIe Downstream Ports and multi-function devices.
The default ACS settings are strict and deny any direct traffic
between two functions. This results in the smallest iommu_group the
HW can support. Frequently these values result in slow or
non-working P2PDMA.

ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:
   - Full prevention
   - Translated requests can be direct, with various options
   - Asymmetric direct traffic, A can reach B but not the reverse
   - All traffic can be direct
Along with some other less common ones for special topologies.

The intention is that this option would be used with expert knowledge
of the HW capability and workload to achieve the desired
configuration.

Hi,

this breaks ACS on some platforms (in 6.11). See:
https://bugzilla.suse.com/show_bug.cgi?id=1229019

When starting a KVM:
"Error starting domain: internal error: QEMU unexpectedly closed the monitor (vm=‘win10’): qxl_send_events: spice-server bug: guest stopped, ignoring
2024-08-08T01:45:51.824474Z qemu-system-x86_64: -device {“driver”:“vfio-pci”,“host”:“0000:0b:00.0”,“id”:“hostdev0”,“bus”:“pci.2”,“addr”:“0x0”}: vfio 0000:0b:00.0: group 83 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver."



We backported the commit to 6.4, there they see:
The good kernel 6.4.0-150600.23.14 log has this:

[    0.618918] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled
[    0.619153] pci 0000:00:1c.4: Intel PCH root port ACS workaround enabled

the bad one 6.4.0-150600.23.22 does not.

But the same difference is with 6.10 vs 6.11.

Any ideas? It looks like workarounds are not applied anymore.

Signed-off-by: Vidya Sagar <vidyas@xxxxxxxxxx>
---
v4:
* Changed commit message (Courtesy: Jason) to provide more details

v3:
* Fixed a documentation issue reported by kernel test bot

v2:
* Refactored the code as per Jason's suggestion

  .../admin-guide/kernel-parameters.txt         |  22 +++
  drivers/pci/pci.c                             | 148 +++++++++++-------
  2 files changed, 112 insertions(+), 58 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 500cfa776225..42d0f6fd40d0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4619,6 +4619,28 @@
  				bridges without forcing it upstream. Note:
  				this removes isolation between devices and
  				may put more devices in an IOMMU group.
+		config_acs=
+				Format:
+				=<ACS flags>@<pci_dev>[; ...]
+				Specify one or more PCI devices (in the format
+				specified above) optionally prepended with flags
+				and separated by semicolons. The respective
+				capabilities will be enabled, disabled or unchanged
+				based on what is specified in flags.
+				ACS Flags is defined as follows
+				bit-0 : ACS Source Validation
+				bit-1 : ACS Translation Blocking
+				bit-2 : ACS P2P Request Redirect
+				bit-3 : ACS P2P Completion Redirect
+				bit-4 : ACS Upstream Forwarding
+				bit-5 : ACS P2P Egress Control
+				bit-6 : ACS Direct Translated P2P
+				Each bit can be marked as
+				‘0‘ – force disabled
+				‘1’ – force enabled
+				‘x’ – unchanged.
+				Note: this may remove isolation between devices
+				and may put more devices in an IOMMU group.
  		force_floating	[S390] Force usage of floating interrupts.
  		nomio		[S390] Do not use MIO instructions.
  		norid		[S390] ignore the RID field and force use of
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 55078c70a05b..6661932afe59 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -946,30 +946,67 @@ void pci_request_acs(void)
  }
static const char *disable_acs_redir_param;
+static const char *config_acs_param;
-/**
- * pci_disable_acs_redir - disable ACS redirect capabilities
- * @dev: the PCI device
- *
- * For only devices specified in the disable_acs_redir parameter.
- */
-static void pci_disable_acs_redir(struct pci_dev *dev)
+struct pci_acs {
+	u16 cap;
+	u16 ctrl;
+	u16 fw_ctrl;
+};
+
+static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
+			     const char *p, u16 mask, u16 flags)
  {
+	char *delimit;
  	int ret = 0;
-	const char *p;
-	int pos;
-	u16 ctrl;
- if (!disable_acs_redir_param)
+	if (!p)
  		return;
- p = disable_acs_redir_param;
  	while (*p) {
+		if (!mask) {
+			/* Check for ACS flags */
+			delimit = strstr(p, "@");
+			if (delimit) {
+				int end;
+				u32 shift = 0;
+
+				end = delimit - p - 1;
+
+				while (end > -1) {
+					if (*(p + end) == '0') {
+						mask |= 1 << shift;
+						shift++;
+						end--;
+					} else if (*(p + end) == '1') {
+						mask |= 1 << shift;
+						flags |= 1 << shift;
+						shift++;
+						end--;
+					} else if ((*(p + end) == 'x') || (*(p + end) == 'X')) {
+						shift++;
+						end--;
+					} else {
+						pci_err(dev, "Invalid ACS flags... Ignoring\n");
+						return;
+					}
+				}
+				p = delimit + 1;
+			} else {
+				pci_err(dev, "ACS Flags missing\n");
+				return;
+			}
+		}
+
+		if (mask & ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
+			    PCI_ACS_UF | PCI_ACS_EC | PCI_ACS_DT)) {
+			pci_err(dev, "Invalid ACS flags specified\n");
+			return;
+		}
+
  		ret = pci_dev_str_match(dev, p, &p);
  		if (ret < 0) {
-			pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
-				     disable_acs_redir_param);
-
+			pr_info_once("PCI: Can't parse acs command line parameter\n");
  			break;
  		} else if (ret == 1) {
  			/* Found a match */
@@ -989,56 +1026,38 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
  	if (!pci_dev_specific_disable_acs_redir(dev))
  		return;
- pos = dev->acs_cap;
-	if (!pos) {
-		pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n");
-		return;
-	}
-
-	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+	pci_dbg(dev, "ACS mask  = 0x%X\n", mask);
+	pci_dbg(dev, "ACS flags = 0x%X\n", flags);
- /* P2P Request & Completion Redirect */
-	ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
+	/* If mask is 0 then we copy the bit from the firmware setting. */
+	caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
+	caps->ctrl |= flags;
- pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
-
-	pci_info(dev, "disabled ACS redirect\n");
+	pci_info(dev, "Configured ACS to 0x%x\n", caps->ctrl);
  }
/**
   * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
   * @dev: the PCI device
+ * @caps: default ACS controls
   */
-static void pci_std_enable_acs(struct pci_dev *dev)
+static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
  {
-	int pos;
-	u16 cap;
-	u16 ctrl;
-
-	pos = dev->acs_cap;
-	if (!pos)
-		return;
-
-	pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
-	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
-
  	/* Source Validation */
-	ctrl |= (cap & PCI_ACS_SV);
+	caps->ctrl |= (caps->cap & PCI_ACS_SV);
/* P2P Request Redirect */
-	ctrl |= (cap & PCI_ACS_RR);
+	caps->ctrl |= (caps->cap & PCI_ACS_RR);
/* P2P Completion Redirect */
-	ctrl |= (cap & PCI_ACS_CR);
+	caps->ctrl |= (caps->cap & PCI_ACS_CR);
/* Upstream Forwarding */
-	ctrl |= (cap & PCI_ACS_UF);
+	caps->ctrl |= (caps->cap & PCI_ACS_UF);
/* Enable Translation Blocking for external devices and noats */
  	if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
-		ctrl |= (cap & PCI_ACS_TB);
-
-	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+		caps->ctrl |= (caps->cap & PCI_ACS_TB);
  }
/**
@@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct pci_dev *dev)
   */
  static void pci_enable_acs(struct pci_dev *dev)
  {
-	if (!pci_acs_enable)
-		goto disable_acs_redir;
+	struct pci_acs caps;
+	int pos;
+
+	pos = dev->acs_cap;
+	if (!pos)
+		return;
- if (!pci_dev_specific_enable_acs(dev))
-		goto disable_acs_redir;
+	pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
+	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
+	caps.fw_ctrl = caps.ctrl;
- pci_std_enable_acs(dev);
+	/* If an iommu is present we start with kernel default caps */
+	if (pci_acs_enable) {
+		if (pci_dev_specific_enable_acs(dev))
+			pci_std_enable_acs(dev, &caps);
+	}
-disable_acs_redir:
  	/*
-	 * Note: pci_disable_acs_redir() must be called even if ACS was not
-	 * enabled by the kernel because it may have been enabled by
-	 * platform firmware.  So if we are told to disable it, we should
-	 * always disable it after setting the kernel's default
-	 * preferences.
+	 * Always apply caps from the command line, even if there is no iommu.
+	 * Trust that the admin has a reason to change the ACS settings.
  	 */
-	pci_disable_acs_redir(dev);
+	__pci_config_acs(dev, &caps, disable_acs_redir_param,
+			 PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
+			 ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
+	__pci_config_acs(dev, &caps, config_acs_param, 0, 0);
+
+	pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
  }
/**
@@ -6740,6 +6769,8 @@ static int __init pci_setup(char *str)
  				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
  			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
  				disable_acs_redir_param = str + 18;
+			} else if (!strncmp(str, "config_acs=", 11)) {
+				config_acs_param = str + 11;
  			} else {
  				pr_err("PCI: Unknown option `%s'\n", str);
  			}
@@ -6764,6 +6795,7 @@ static int __init pci_realloc_setup_params(void)
  	resource_alignment_param = kstrdup(resource_alignment_param,
  					   GFP_KERNEL);
  	disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
+	config_acs_param = kstrdup(config_acs_param, GFP_KERNEL);
return 0;
  }

--
js
suse labs





[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux