[AMD Public Use] > -----Original Message----- > From: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Sent: Sunday, September 13, 2020 12:05 PM > To: Huacai Chen <chenhc@xxxxxxxxxx> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; Deucher, Alexander > <Alexander.Deucher@xxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; Huacai Chen > <chenhuacai@xxxxxxxxx>; Jiaxun Yang <jiaxun.yang@xxxxxxxxxxx>; Tiezhu > Yang <yangtiezhu@xxxxxxxxxxx> > Subject: Re: [PATCH 2/2] PCI/portdrv: Don't disable pci device during > shutdown > > On Fri, Sep 11, 2020 at 06:09:37PM +0800, Huacai Chen wrote: > > Don't call pci_disable_device() in pcie_port_device_remove() during > > the portdrv's shutdown. This can avoid some poweroff/reboot failures. > > > > The poweroff/reboot failures can easily reproduce on Loongson platforms. > > I think this is not a Loongson-specific problem, instead, is a problem > > related to some specific PCI hosts. On some x86 platforms, > > radeon/amdgpu devices can cause the same problem, and commit > > faefba95c9e8ca3a523831c2e > > ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it. > > > > Radeon driver is more difficult than amdgpu due to its confusing > > symbol names, and I have maintained an out-of-tree patch for a long time > [1]. > > Recently, we found more and more devices can cause the same problem, > > and it is very difficult to modify all problematic drivers as > > radeon/amdgpu does. So, I think modify the PCIe port driver is a > > simple and effective way. > > This needs to explain in more detail what the failure is and how this patch > fixes it. > > The main thing pci_disable_device() does is turn off bus mastering, so I > assume this has to do with DMA during shutdown or reboot. This change is in > portdrv, so it affects PCIe Root Ports and Switch Ports, which of course are > type 1 (bridge) devices. Clearing PCI_COMMAND_MASTER on bridges > prevents them from forwarding memory or I/O requests in the upstream > direction, i.e., it prevents DMA from devices below the bridge. > > But if the problem is DMA, the same problem may occur with Root Complex > Integrated Endpoints or conventional PCI devices, since portdrv may not be > involved in those topologies. I'm not sure I understand what the point of this patch is. Isn't the whole point of the shutdown callback to cleanly tear down whatever tasks are happening on the device? It could be DMA or stuff happening on the device itself (e.g., microcontrollers on the devices, etc.). Most of the complications in GPU drivers are due to the lifetime issues between drm and other subsystems. Alex > > > [1] > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith > > > ub.com%2Fchenhuacai%2Flinux%2Fcommit%2F6612f9c1fc290d42a14618ce9a > 7d030 > > > 14d8ebb1a&data=02%7C01%7Calexander.deucher%40amd.com%7C1cc > 5cca01b5 > > > 74485a0aa08d857feb88e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C > 0%7C637 > > > 356098841869775&sdata=HJmniTV2jJLEiOh3UCFpqPuGeq38y6crax2ahZa > 5Eqc% > > 3D&reserved=0 > > > > Signed-off-by: Huacai Chen <chenhc@xxxxxxxxxx> > > Signed-off-by: Tiezhu Yang <yangtiezhu@xxxxxxxxxxx> > > --- > > drivers/pci/pcie/portdrv_core.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/drivers/pci/pcie/portdrv_core.c > > b/drivers/pci/pcie/portdrv_core.c index 50a9522..1991aca 100644 > > --- a/drivers/pci/pcie/portdrv_core.c > > +++ b/drivers/pci/pcie/portdrv_core.c > > @@ -491,7 +491,6 @@ void pcie_port_device_remove(struct pci_dev > *dev) > > { > > device_for_each_child(&dev->dev, NULL, remove_iter); > > pci_free_irq_vectors(dev); > > - pci_disable_device(dev); > > } > > > > /** > > -- > > 2.7.0 > >