[+cc Manivannan] On Tue, May 30, 2023 at 10:26:29AM +0800, Owen Yang wrote: > Implement this workaround to correct NVMe suspend process. > > SSD will randomly crashed at 100~250+ suspend/resume cycle. Phison and > Qualcomm found that its due to NVMe entering D3cold instead of L1ss. > https://partnerissuetracker.corp.google.com/issues/275663637 > > According to Qualcomm. This issue has been found last year and they have > attempt to submit some patches to fix the pci suspend behavior. > (ref:https://patchwork.kernel.org/project/linux-arm-msm/list/? > series=665060&state=%2A&archive=both). > But somehow these patches were rejected because of its complexity. And > we've got advise from Google that it will be more efficient that we > implement a quirks to fix this issue. > > The DECLARE_PCI_FIXUP_SUSPEND function has already specify the PCI device > ID. And this SSD will only be used at our Chromebook device only. I'll wait for your response to Manivannan: https://lore.kernel.org/r/20230529164856.GE5633@thinkpad If the issue is caused by an out-of-tree patch, this fix needs to stay with that patch. If the issue doesn't happen with the current upstream kernel, this patch isn't relevant for upstream. > Signed-off-by: Owen Yang <ecs.taipeikernel@xxxxxxxxx> > --- > > Changes in v3: > - Adjust comment about issue behavior, ASPM, and the sc7280 connection. > - Fix multi-line code comment. > > Changes in v2: > - Fix subject line style from "drivers: pci: quirks:" to "PCI:" > - Adjust comment. > > drivers/pci/quirks.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index f4e2a88729fd..6d895a4da4b5 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -5945,6 +5945,21 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev) > } > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup); > > +/* > + * In Qualcomm 7c gen 3 sc7280 platform. Some of the SSD will enter > + * D3cold instead of L1ss.It cause the device will randomly crash after > + * suspend within 100~250+ cycles of suspend/resume test. > + * > + * After adding this fixup.We've verified that 10 devices passed > + * the suspend/resume 2500 cycles test. > + */ > +static void phison_suspend_fixup(struct pci_dev *pdev) > +{ > + msleep(30); > +} > +DECLARE_PCI_FIXUP_SUSPEND(0x1987, 0x5013, phison_suspend_fixup); > +DECLARE_PCI_FIXUP_SUSPEND(0x1987, 0x5015, phison_suspend_fixup); > + > static void rom_bar_overlap_defect(struct pci_dev *dev) > { > pci_info(dev, "working around ROM BAR overlap defect\n"); > -- > 2.17.1 >