Hi, Bjorn, On Fri, Jun 17, 2022 at 7:37 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Fri, Jun 17, 2022 at 10:21:14AM +0800, Huacai Chen wrote: > > On Fri, Jun 17, 2022 at 6:57 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > On Thu, Jun 16, 2022 at 04:39:46PM +0800, Huacai Chen wrote: > > > > On Thu, Jun 9, 2022 at 3:31 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > On Wed, Jun 08, 2022 at 05:34:21PM +0800, Huacai Chen wrote: > > > > > > On Fri, Jun 3, 2022 at 12:29 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > > > On Thu, Jun 02, 2022 at 08:48:20PM +0800, Huacai Chen wrote: > > > > > > > > On Wed, Jun 1, 2022 at 7:35 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > > > > > On Sat, Apr 30, 2022 at 04:48:45PM +0800, Huacai Chen wrote: > > > > > > > > > > Commit cc27b735ad3a75574a ("PCI/portdrv: Turn off PCIe > > > > > > > > > > services during shutdown") causes poweroff/reboot > > > > > > > > > > failure on systems with LS7A chipset. We found that if > > > > > > > > > > we remove "pci_command &= ~PCI_COMMAND_MASTER;" in > > > > > > > > > > do_pci_disable_device(), it can work well. The hardware > > > > > > > > > > engineer says that the root cause is that CPU is still > > > > > > > > > > accessing PCIe devices while poweroff/reboot, and if we > > > > > > > > > > disable the Bus Master Bit at this time, the PCIe > > > > > > > > > > controller doesn't forward requests to downstream > > > > > > > > > > devices, and also doesn't send TIMEOUT to CPU, which > > > > > > > > > > causes CPU wait forever (hardware deadlock). This > > > > > > > > > > behavior is a PCIe protocol violation (Bus Master should > > > > > > > > > > not be involved in CPU MMIO transactions), and it will > > > > > > > > > > be fixed in new revisions of hardware (add timeout > > > > > > > > > > mechanism for CPU read request, whether or not Bus > > > > > > > > > > Master bit is cleared). > > > > > > > > > > > > > > > > > > LS7A might have bugs in that clearing Bus Master Enable > > > > > > > > > prevents the root port from forwarding Memory or I/O > > > > > > > > > requests in the downstream direction. > > > > > > > > > > > > > > > > > > But this feels like a bit of a band-aid because we don't > > > > > > > > > know exactly what those requests are. If we're removing > > > > > > > > > the Root Port, I assume we think we no longer need any > > > > > > > > > devices *below* the Root Port. > > > > > > > > > > > > > > > > > > If that's not the case, e.g., if we still need to produce > > > > > > > > > console output or save state to a device, we probably > > > > > > > > > should not be removing the Root Port at all. > > > > > > > > > > > > > > > > Do you mean it is better to skip the whole > > > > > > > > pcie_port_device_remove() instead of just removing the > > > > > > > > "clear bus master" operation for the buggy hardware? > > > > > > > > > > > > > > No, that's not what I want at all. That's just another > > > > > > > band-aid to avoid a problem without understanding what the > > > > > > > problem is. > > > > > > > > > > > > > > My point is that apparently we remove a Root Port (which means > > > > > > > we've already removed any devices under it), and then we try > > > > > > > to use a device below the Root Port. That seems broken. I > > > > > > > want to understand why we try to use a device after we've > > > > > > > removed it. > > > > > > > > > > > > I agree, and I think "why we try to use a device after remove > > > > > > it" is because the userspace programs don't know whether a > > > > > > device is "usable", so they just use it, at any time. Then it > > > > > > seems it is the responsibility of the device drivers to avoid > > > > > > the problem. > > > > > > > > > > How is userspace able to use a device after the device is removed? > > > > > E.g., if userspace does a read/write to a device that has been > > > > > removed, the syscall should return error, not touch the missing > > > > > device. If userspace mmaps a device, an access after the device > > > > > has been removed should fault, not do MMIO to the missing device. > > > > > > > > To give more details, let's take the graphics driver (e.g. amdgpu) > > > > as an example again. The userspace programs call printf() to display > > > > "shutting down xxx service" during shutdown/reboot. Or we can even > > > > simplify further, the kernel calls printk() to display something > > > > during shutdown/reboot. You know, printk() can happen at any time, > > > > even after we call pcie_port_device_remove() to disable the pcie > > > > port on the graphic card. > > > > > > I've been focusing on the *remove* path, but you said the problem > > > you're solving is with *poweroff/reboot*. pcie_portdrv_remove() is > > > used for both paths, but if there's a reason we need those paths to be > > > different, we might be able to split them. > > > > I'm very sorry for that. I have misunderstood before because I suppose > > the "remove path" is the pcie_portdrv_remove() function, but your > > meaning is the .remove() callback in pcie_portdriver. Am I right this > > time? > > No need to be sorry, you clearly said from the beginning that this was > a shutdown issue, not a remove issue! I was just confused because the > .remove() and the .shutdown() callbacks are both > pcie_portdrv_remove(), so I was thinking "remove" even though you said > "poweroff". > > > > For remove, we have to assume accesses to the device may already or > > > will soon fail. A driver that touches the device, or a device that > > > performs DMA, after its drv->remove() method has been called would be > > > seriously broken. The remove operation also unbinds the driver from > > > the device. > > > > Then what will happen about the "remove path"? If we still take the > > graphics driver as an example, "rmmod amdgpu" always fails with > > "device is busy" because the graphics card is always be used once > > after the driver is loaded. So the "remove path" has no chance to be > > executed. > > Do you think this is a problem? It doesn't sound like a problem to > me, but I don't know anything about graphics drivers. I assume that > if a device is in use, the expected behavior is that we can't remove > the driver. This isn't a problem, and I've sent V14, which only modifies the shutdown logic. Huacai > > > But if we take a NIC driver as an example, "rmmod igb" can > > mostly succeed, and there will be no access on the device after > > removing, at least in our observation. I think there is nothing broken > > about the "remove path". > > I agree. > > Bjorn