Thanks Bjorn for reviewing patch. On 4/30/21 12:01 PM, Bjorn Helgaas wrote: > External email: Use caution opening links or attachments > > > On Wed, Apr 28, 2021 at 07:49:07PM -0500, Shanker Donthineni wrote: >> On select platforms, some Nvidia GPU devices do not work with SBR. >> Triggering SBR would leave the device inoperable for the current >> system boot. It requires a system hard-reboot to get the GPU device >> back to normal operating condition post-SBR. For the affected >> devices, enable NO_BUS_RESET quirk to fix the issue. > Since 1/2 adds _RST support, should I infer that _RST works on these > Nvidia GPUs even though SBR does not? If so, how does _RST do the > reset? Yes, _RST method works but not SBR. The _RST method in DSDT-AML uses platform-specific initialization steps outside of the GPU BARs for resetting the GPU device. > Do you have a root cause for why SBR doesn't work? It is a hardware implementation specific issue. GPU end-point device is inoperative after receiving SBR from the RP/SwitchPort. This quirk is to prevent SBR. > I'm not super > confident that we perform resets correctly in general, and if the > problem is an issue in Linux, it'd be nice to fix that. We have not seen any issue with Linux SBR implementation. > >> This issue will be fixed in the next generation of hardware. >> >> Signed-off-by: Shanker Donthineni <sdonthineni@xxxxxxxxxx> >> --- >> Changes since v1: >> - Split patch into 2, code for handling _RST and SBR specific quirk >> - The RST based reset is called as a first-class mechanism in the reset code path >> >> drivers/pci/quirks.c | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >> index 8f47d139c381..e1216a8165df 100644 >> --- a/drivers/pci/quirks.c >> +++ b/drivers/pci/quirks.c >> @@ -3910,6 +3910,18 @@ static int delay_250ms_after_flr(struct pci_dev *dev, int probe) >> return 0; >> } >> >> +/* >> + * Some Nvidia GPU devices do not work with bus reset, SBR needs to be >> + * prevented for those affected devices. >> + */ >> +static void quirk_nvidia_no_bus_reset(struct pci_dev *dev) >> +{ >> + if ((dev->device & 0xffc0) == 0x2340) >> + dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET; >> +} >> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, >> + quirk_nvidia_no_bus_reset); > Can you move this next to the existing quirk_no_bus_reset(), and maybe > even just call quirk_no_bus_reset(), e.g., > > if ((dev->device & 0xffc0) == 0x2340) > quirk_no_bus_reset(dev); > > It doesn't look connected to this spot. > Thanks, I will move next to the existing NO_BUS_RESET quirks. >> static const struct pci_dev_reset_methods pci_dev_reset_methods[] = { >> { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF, >> reset_intel_82599_sfp_virtfn }, >> -- >> 2.17.1 >>