Hi, Bjorn and Sinan, On Tue, Jun 8, 2021 at 4:43 AM Sinan Kaya <okaya@xxxxxxxxxx> wrote: > > On 6/4/2021 12:24 PM, Huacai Chen wrote: > >> So you need to explain why we need to allow DMA from those devices > >> even after we shutdown the port. "It makes reboot work" is not a > >> sufficient explanation. > > I think only the designer of LS7A can tell us why. So, Mr. Shuai > > Huang, could you please explain this? > > > > Could there be some kind of a shutdown/init problem on your graphics > card driver? > > During shutdown, remove() callback of all endpoints get called. This is > the opportunity for your graphics driver to put hardware into safe > state. > > If there is a problem with the hardware/driver, it should be a quirk as > opposed to changing the default safe behavior for all devices. I have had an offline discussion with Mr. Shuai Huang, he told me that CPU is still writing data to framebuffer while poweroff/reboot, and if we clear Bus Master Bit at this time, CPU will wait ack from device, but never return, so deadlock. More or less, we can modify the GPU driver to avoid this, as I said in the commit message: "The poweroff/reboot failures could easily be reproduced on Loongson platforms. I think this is not a Loongson-specific problem, instead, is a problem related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu devices can cause the same problem [1][2], and commit faefba95c9e8ca3a ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it. Radeon driver is more difficult than amdgpu due to its confusing symbol names, and I have maintained an out-of-tree patch for a long time [4]." [1] https://bugs.freedesktop.org/show_bug.cgi?id=97980 [2] https://bugs.freedesktop.org/show_bug.cgi?id=98638 [4] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0 Modifing every GPU driver is impossible for me, and I found some RAID controllers have problems, too. > > The port driver here prevents memory from getting corrupted by rogue > hardware. There is a window during kexec where hardware can write to > system memory addresses if IOVA addresses and system memory addresses > overlap. > KEXEC has no problems, as discussed before: http://patchwork.ozlabs.org/project/linux-pci/patch/1600680138-10949-1-git-send-email-chenhc@xxxxxxxxxx/ static void pci_device_shutdown(struct device *dev) { ... if (drv && drv->shutdown) drv->shutdown(pci_dev); /* * If this is a kexec reboot, turn off Bus Master bit on the * device to tell it to not continue to do DMA. Don't touch * devices in D3cold or unknown states. * If it is not a kexec reboot, firmware will hit the PCI * devices with big hammer and stop their DMA any way. */ if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) pci_clear_master(pci_dev); } Huacai