Re: [PATCH V2 1/4] PCI/portdrv: Don't disable device during shutdown

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Bjorn and Sinan,

On Tue, Jun 8, 2021 at 4:43 AM Sinan Kaya <okaya@xxxxxxxxxx> wrote:
>
> On 6/4/2021 12:24 PM, Huacai Chen wrote:
> >> So you need to explain why we need to allow DMA from those devices
> >> even after we shutdown the port.  "It makes reboot work" is not a
> >> sufficient explanation.
> > I think only the designer of LS7A can tell us why. So, Mr. Shuai
> > Huang, could you please explain this?
> >
>
> Could there be some kind of a shutdown/init problem on your graphics
> card driver?
>
> During shutdown, remove() callback of all endpoints get called. This is
> the opportunity for your graphics driver to put hardware into safe
> state.
>
> If there is a problem with the hardware/driver, it should be a quirk as
> opposed to changing the default safe behavior for all devices.
I have had an offline discussion with Mr. Shuai Huang, he told me that
CPU is still writing data to framebuffer while poweroff/reboot, and if
we clear Bus Master Bit at this time, CPU will wait ack from device,
but never return, so deadlock.

More or less, we can modify the GPU driver to avoid this, as I said in
the commit message:

"The poweroff/reboot failures could easily be reproduced on Loongson
platforms. I think this is not a Loongson-specific problem, instead, is
a problem related to some specific PCI hosts. On some x86 platforms,
radeon/amdgpu devices can cause the same problem [1][2], and commit
faefba95c9e8ca3a ("drm/amdgpu: just suspend the hw on pci shutdown")
can resolve it.

Radeon driver is more difficult than amdgpu due to its confusing symbol
names, and I have maintained an out-of-tree patch for a long time [4]."

[1] https://bugs.freedesktop.org/show_bug.cgi?id=97980
[2] https://bugs.freedesktop.org/show_bug.cgi?id=98638
[4] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0

Modifing every GPU driver is impossible for me, and I found some RAID
controllers have problems, too.

>
> The port driver here prevents memory from getting corrupted by rogue
> hardware. There is a window during kexec where hardware can write to
> system memory addresses if IOVA addresses and system memory addresses
> overlap.
>
KEXEC has no problems, as discussed before:
http://patchwork.ozlabs.org/project/linux-pci/patch/1600680138-10949-1-git-send-email-chenhc@xxxxxxxxxx/

static void pci_device_shutdown(struct device *dev)
{
  ...
         if (drv && drv->shutdown)
                 drv->shutdown(pci_dev);

         /*
          * If this is a kexec reboot, turn off Bus Master bit on the
          * device to tell it to not continue to do DMA. Don't touch
          * devices in D3cold or unknown states.
          * If it is not a kexec reboot, firmware will hit the PCI
          * devices with big hammer and stop their DMA any way.
          */
         if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
                 pci_clear_master(pci_dev);
}

Huacai



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux