Hi, > -----Original Message----- > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Tuesday, November 27, 2018 9:39 PM > To: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Cc: Bharat Bhushan <bharat.bhushan@xxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx; bharatb.yadav@xxxxxxxxx; David Daney > <david.daney@xxxxxxxxxx>; Jan Glauber <jglauber@xxxxxxxxxx>; Maik > Broemme <mbroemme@xxxxxxxxxx>; Chris Blake > <chrisrblake93@xxxxxxxxx> > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus > > On Tue, 27 Nov 2018 09:33:56 -0600 > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > [+cc David, Jan, Alex, Maik, Chris] > > > > On Tue, Nov 27, 2018 at 08:46:33AM +0000, Bharat Bhushan wrote: > > > NXP (Freescale Vendor ID) LS1088 chips do not behave correctly after > > > bus reset with e1000e. Link state of device does not comes UP and so > > > config space never accessible again. > > > > Previous similar commits: > > > > 822155100e58 ("PCI: Mark Cavium CN8xxx to avoid bus reset") > > 8e2e03179923 ("PCI: Mark Atheros AR9580 to avoid bus reset") > > 9ac0108c2bac ("PCI: Mark Atheros AR9485 and QCA9882 to avoid bus > reset") > > c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset") > > > > 1) Please make your subject match (remove the spurious "bus" at the > > end) Will correct, added by mistake > > > > 2) This should probably be marked for stable (v3.14 and later, since > > the quirk itself appeared in v3.19 and marked for v3.14 and later > > stable kernels). Maybe even mark it as "Fixes: c3e59ee4e766..." to > > connect it. Ok, > > > > 3) The 1957:80c0 PCI ID doesn't appear in > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpci > - > ids.ucw.cz%2F&data=02%7C01%7Cbharat.bhushan%40nxp.com%7C296 > 02a2efa584249221808d65482945b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7 > C0%7C0%7C636789317139032063&sdata=3jkRMa1NljSCp%2BvZP0kgz7D > PWPJZH8d7JXhCE5vCCMk%3D&reserved=0; can you add it? > > Yes, I will add > > 4) Is there a hardware erratum for this? If so, please include the > > URL here. No h/w errata as of now. > > > > 5) Can you reproduce the problem using the same endpoint (e1000e) on a > > different system with a different bridge? I have multiple LS1088 boards and I can observe problem with all LS1088 boards. While when I uses same PCI device on other NXP board (LS2088) then it works fine. > > > > 6) Have you looked at this with a PCIe analyzer? It would be very > > interesting to compare the boot-time or system reboot path with the > > individual bus reset path you're fixing. I have not used PCIe analyzer, > > > > Since there are several similar reports and they sometimes involve the > > same devices (both your patch and 822155100e58 mention e1000e), I'm a > > little suspicious that we're doing something wrong in the bus reset > > path. > > I agree, entirely excluding bus resets is not something to be taken lightly. It's > less than ideal for an endpoint and a fairly major functional gap for a > downstream port. It should really be considered a last resort. > > > I think bus reset uses Secondary Bus Reset in the Bridge Control > > register. That's a generic mechanism that I would expect to be pretty > > well-tested. I suspect the BIOS probably uses it in the reboot path, > > and the device probably works after that. > > > > So I wonder if the Linux delay isn't quite long enough, or our first > > access to the device isn't quite right, e.g., maybe there's some issue > > with the bus/device number capture (PCIe r4.0, sec 2.2.6.2). > > Tweaking the delay would be a reasonable solution, though we are seeing > some issues where users with lots of assigned devices that require bus > resets experience long delays as vfio file descriptors are closed sequentially > on exit. In pci_reset_secondary_bus() I have tried to increase the delay after reset but not helped. Do I need to add delay at some other place as well? Thanks -Bharat > So perhaps we could flag downstream ports requiring an extra delay, > if that becomes a solution. Your mention of the bus/device number also > reminds me of the issue we saw on Threadripper where there were patches > proposed to re-write the secondary and subordinate bus numbers after > reset. AMD was able to resolve that in a firmware update, but there could > be something similar occurring here. Thanks, > > Alex > > > > Signed-off-by: Bharat Bhushan <Bharat.Bhushan@xxxxxxx> > > > --- > > > drivers/pci/quirks.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index > > > 4700d24e5d55..b9ae4e9f101a 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -3391,6 +3391,13 @@ > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, > quirk_no_bus_reset); > > > */ > > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, > > > quirk_no_bus_reset); > > > > > > +/* > > > + * NXP (Freescale Vendor ID) LS1088 chips do not behave correctly > > > +after > > > + * bus reset. Link state of device does not comes UP and so config > > > +space > > > + * never accessible again. > > > + */ > > > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, 0x80c0, > > > +quirk_no_bus_reset); > > > + > > > static void quirk_no_pm_reset(struct pci_dev *dev) { > > > /* > > > -- > > > 2.19.1 > > >