On 2/27/2018 5:19 PM, Bjorn Helgaas wrote:
On Tue, Feb 27, 2018 at 02:17:28PM +0200, Tal Gilboa wrote:
On 2/22/2018 10:21 PM, Bjorn Helgaas wrote:
On Mon, Jan 15, 2018 at 05:35:37PM +0200, Tal Gilboa wrote:
pcie_get_link_limits() function, which is based on
pcie_get_minimum_link(), iterates over the PCI chain
and calculates maximum BW in addition to minimum speed and
width. The BW calculation at each level is speed * width, so
even if, for instance, a level has lower width, than the device max
capabilities, it still might not cause a BW limitation if it has a
higher speed. pcie_get_minimum_link() is kept for compatibility.
Signed-off-by: Tal Gilboa <talgi@xxxxxxxxxxxx>
Reviewed-by: Tariq Toukan <tariqt@xxxxxxxxxxxx>
---
drivers/pci/pci.c | 31 ++++++++++++++++++++++++++++---
include/linux/pci.h | 8 ++++++++
2 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index cf0401d..f0c60dc 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5012,16 +5012,37 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
* @speed: storage for minimum speed
* @width: storage for minimum width
*
- * This function will walk up the PCI device chain and determine the minimum
- * link width and speed of the device.
+ * This function use pcie_get_link_limits() for determining the minimum
+ * link width and speed of the device. Legacy code is kept for compatibility.
*/
int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width)
{
+ int bw;
+
+ return pcie_get_link_limits(dev, speed, width, &bw);
+}
+EXPORT_SYMBOL(pcie_get_minimum_link);
+
+/**
+ * pcie_get_link_limits - determine minimum link settings of a PCI
+ device and its BW limitation
+ * @dev: PCI device to query
+ * @speed: storage for minimum speed
+ * @width: storage for minimum width
+ * @bw: storage for BW limitation
+ *
+ * This function walks up the PCI device chain and determines the link
+ * limitations - minimum width, minimum speed and max possible BW of the device.
I don't really like this interface because it mixes two separate
things: (1) the speed/width capabilities of the current device, and
(2) the minimum speed/width of the path leading to the device (and
these minimums might even be at different points in the path).
I think what we want is a way to get the max bandwidth a device can
use, i.e., pcie_get_speed_cap() * pcie_get_width_cap().
If we also had a way to find the minimum bandwidth point leading to a
device, e.g., a pcie_get_minimum_bandwidth() that returned a pci_dev
and its bandwidth (and maybe the link speed and width corresponding to
that device), then you could print a meaningful message like "this
device is capable of X, but upstream device Y limits us to Z".
I think the current pcie_print_link_status() is a little misleading
because it prints the minimum link speed and width from the path, but
those might be from different devices, which doesn't really make
sense.
I see. We started out with simply checking speed and width only for the
device. If the chain bandwidth check is misleading I'll revert back to the
original behavior. No need to calculate bandwidth IMO. Checking current
speed and width and comparing to max capabilities should be enough.
Let's think about where we want to end up. I think you want to emit a
message if either the current link width or speed is lower than the
device is capable of. You don't need the bandwidth for that.
Do you also want to emit a message if there's a bottleneck upstream
from the device? E.g., the link to the device may be x8, 5GT/s, which
is capable of a bandwidth of 40Gb/s, but if there's an upstream link
that is x4, 5GT/s, the device will be limited to a bandwidth of
20Gb/s. If you want to check for that, I think we do need to figure
the bandwidth. If we want a message in situations like this, but we
didn't figure the bandwidth, we would erroneously complain if the
upstream link were x16, 2.5GT/s, where that link has a lower speed but
is wider so it's still capable of 40Gb/s.
I think this would have been nice but would result in more questions
than answers. I think we can live without it and simply check the device
itself.
If you want a message about upstream bottlenecks, do you want to
identify the specific device that is the bottleneck?
IIRC, you're adding these:
int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width)
int pcie_get_speed_cap(struct pci_dev *dev, enum pci_bus_speed *speed)
int pcie_get_link_limits(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width, int *bw);
It seems like we might not need all three of these, since the last one
looks like it could subsume the first two.
This patch becomes obsolete is we can just use pcie_get_minimum_link()
as it was. pcie_get_width/speed_cap() are still needed as they calculate
the max capability. pcie_get_link_limits and pcie_get_minimum_link()
calculate the link status.
+int pcie_get_link_limits(struct pci_dev *dev, enum pci_bus_speed *speed,
+ enum pcie_link_width *width, int *bw)
+{
int ret;
*speed = PCI_SPEED_UNKNOWN;
*width = PCIE_LNK_WIDTH_UNKNOWN;
+ *bw = 0;
while (dev) {
u16 lnksta;
@@ -5042,12 +5063,16 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
if (next_width < *width)
*width = next_width;
+ /* Check if current level in the chain limits the total BW */
+ if (!(*bw) || *bw > (next_width) * PCIE_SPEED2MBS(next_speed))
+ *bw = (next_width) * PCIE_SPEED2MBS(next_speed);
+
dev = dev->bus->self;
}
return 0;
}
-EXPORT_SYMBOL(pcie_get_minimum_link);
+EXPORT_SYMBOL(pcie_get_link_limits);
/**
* pcie_get_speed_cap - queries for the PCI device's link speed capability
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 27dc070..88e23eb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -265,6 +265,12 @@ enum pci_bus_speed {
(speed) == PCIE_SPEED_2_5GT ? "2.5 GT/s" : \
"Unknown speed")
+#define PCIE_SPEED2MBS(speed) \
+ ((speed) == PCIE_SPEED_8_0GT ? 8000 : \
+ (speed) == PCIE_SPEED_5_0GT ? 5000 : \
+ (speed) == PCIE_SPEED_2_5GT ? 2500 : \
+ 0)
+
struct pci_cap_saved_data {
u16 cap_nr;
bool cap_extended;
@@ -1088,6 +1094,8 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width);
int pcie_get_speed_cap(struct pci_dev *dev, enum pci_bus_speed *speed);
int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width);
+int pcie_get_link_limits(struct pci_dev *dev, enum pci_bus_speed *speed,
+ enum pcie_link_width *width, int *bw);
void pcie_flr(struct pci_dev *dev);
int __pci_reset_function_locked(struct pci_dev *dev);
int pci_reset_function(struct pci_dev *dev);
--
1.8.3.1