On Tue, Nov 20, 2018 at 10:13 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Tue, Nov 20, 2018 at 09:17:52AM -0500, Alex Deucher wrote: > > On Mon, Nov 19, 2018 at 7:47 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote: > > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > > > the register and compare it against them. > > > > > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 > > > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because > > > > the slot is being incorrectly reported as PCIe-v3 capable. > > > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > > > Cc: stable@xxxxxxxxxxxxxxx # v4.17+ > > > > > > > > --- > > > > drivers/pci/pci.c | 8 ++++---- > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > Index: linux-4.19/drivers/pci/pci.c > > > > =================================================================== > > > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > > +++ linux-4.19/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100 > > > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st > > > > > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > > > if (lnkcap) { > > > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB) > > > > return PCIE_SPEED_16_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB) > > > > return PCIE_SPEED_8_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB) > > > > return PCIE_SPEED_5_0GT; > > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > > > return PCIE_SPEED_2_5GT; > > > > } > > > > > > > > > > I'd like to apply this as below, where I removed the 8_0GB and 16_0GB > > > cases as recommended by the spec. I can't test it myself, and the > > > bugzillas don't contain enough information for me to confirm that the > > > patch below is enough (the "lspci -vv" output of the root port and GPU > > > is what I would need). > > > > > > I'm confused about the fact that 6cf57be0f78e appeared in v4.17, but > > > v4.18 works fine according to both bugzillas. > > > > This issue affects AMD GPUs because we switched from using an open > > coded check for pcie link speeds in the driver to using the common > > pcie variants in > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.19&id=5d9a6330403271fbb1244f14380a7cc44662796f > > OK, thanks. I added that SHA1 and a note to explain the connection. > > > The patch below would regress performance, at least on AMD GPUs, since > > we'd end up reporting a max speed of gen 2 (5 GT/s) which would cause > > the driver to limit the speed to gen2 even if gen3 or 4 are available. > > I guess this means these parts are broken with respect to the spec, > since they support gen3 speeds but don't implement LnkCap2? Sorry, I mis-read the patch. The patch is correct and our parts are compliant. We are only hitting this issue on non-gen3 platforms which would end up in the second case. Sorry for the confusion. You new patch is: Acked-by: Alex Deucher <alexander.deucher@xxxxxxx> Alex > > Gen2, i.e., PCIe r2.1, only defined 2.5 GT/s and 5 GT/s. Devices > capable of the higher speeds added by PCIe r3.0 are supposed to > implement LnkCap2, but if we're even getting to this code, it means > LnkCap2 was zero. > > If we confirm that this is a device defect, the question is what the > best way to work around it is. Probably the original patch is easier > than some sort of quirk, but we'd need to expand the comment a little > bit to explain why we're not following the spec recommendation. > > It looks like lspci probably also needs some updating here -- it > currently doesn't do anything at all with LnkCap2. > > > > I also don't have a good feel for whether this is urgent enough to be > > > a v4.20 fix or whether it can wait for v4.21. Evidence either way > > > would help. > > > > I'd like it to land for 4.19 and 4.20 at least. Alternatively, we > > could revert all of the drm patches to and bring back all the open > > coded implementations, but it's a fairly large number of patches to > > revert. > > OK, sounds like it makes sense to do this for v4.20 and backport it at > least to v4.19 stable. I do want to get the places below fixed also. > They may not be as urgent, but we might as well try and make > everything consistent while we're looking at it. > > > > We also need similar fixes in pci_set_bus_speed(), pcie_speeds() > > > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(), > > > qla24xx_pci_info_str(), and maybe a couple other places. > > > > > > Bjorn > > > > > > > > > commit 871f73abf4b8e6aee8a206775f944ede7c7d7250 > > > Author: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > Date: Tue Oct 30 12:36:08 2018 -0400 > > > > > > PCI: Fix incorrect value returned from pcie_get_speed_cap() > > > > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask > > > the register and compare it against them. > > > > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret > > > is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the > > > slot is being incorrectly reported as PCIe-v3 capable. > > > > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed") > > > Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704 > > > Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778 > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and > > > PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2] > > > Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > Acked-by: Alex Deucher <alexander.deucher@xxxxxxx> > > > Cc: stable@xxxxxxxxxxxxxxx # v4.17+ > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index d068f11d08a7..8563d1d9b102 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) > > > u32 lnkcap2, lnkcap; > > > > > > /* > > > - * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link > > > - * Speeds Vector in Link Capabilities 2 when supported, falling > > > - * back to Max Link Speed in Link Capabilities otherwise. > > > + * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18. The > > > + * implementation note there recommends using the Supported Link > > > + * Speeds Vector in Link Capabilities 2 when supported. > > > + * > > > + * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software > > > + * should use the Supported Link Speeds field in Link Capabilities, > > > + * where only 2.5 GT/s and 5.0 GT/s speeds were defined. > > > */ > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2); > > > if (lnkcap2) { /* PCIe r3.0-compliant */ > > > @@ -5575,13 +5579,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev) > > > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap); > > > if (lnkcap) { > > > - if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB) > > > - return PCIE_SPEED_16_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB) > > > - return PCIE_SPEED_8_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB) > > > + if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB) > > > return PCIE_SPEED_5_0GT; > > > - else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB) > > > + else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB) > > > return PCIE_SPEED_2_5GT; > > > } > > >