Re: [PATCH V13 5/6] PCI: Add quirk for LS7A to avoid reboot failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 02, 2022 at 08:48:20PM +0800, Huacai Chen wrote:
> Hi, Bjorn,
> 
> On Wed, Jun 1, 2022 at 7:35 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > On Sat, Apr 30, 2022 at 04:48:45PM +0800, Huacai Chen wrote:
> > > Commit cc27b735ad3a75574a ("PCI/portdrv: Turn off PCIe services
> > > during shutdown") causes poweroff/reboot failure on systems with
> > > LS7A chipset.  We found that if we remove "pci_command &=
> > > ~PCI_COMMAND_MASTER;" in do_pci_disable_device(), it can work
> > > well. The hardware engineer says that the root cause is that CPU
> > > is still accessing PCIe devices while poweroff/reboot, and if we
> > > disable the Bus Master Bit at this time, the PCIe controller
> > > doesn't forward requests to downstream devices, and also doesn't
> > > send TIMEOUT to CPU, which causes CPU wait forever (hardware
> > > deadlock). This behavior is a PCIe protocol violation (Bus
> > > Master should not be involved in CPU MMIO transactions), and it
> > > will be fixed in new revisions of hardware (add timeout
> > > mechanism for CPU read request, whether or not Bus Master bit is
> > > cleared).
> >
> > LS7A might have bugs in that clearing Bus Master Enable prevents the
> > root port from forwarding Memory or I/O requests in the downstream
> > direction.
> >
> > But this feels like a bit of a band-aid because we don't know exactly
> > what those requests are.  If we're removing the Root Port, I assume we
> > think we no longer need any devices *below* the Root Port.
> >
> > If that's not the case, e.g., if we still need to produce console
> > output or save state to a device, we probably should not be removing
> > the Root Port at all.
>
> Do you mean it is better to skip the whole pcie_port_device_remove()
> instead of just removing the "clear bus master" operation for the
> buggy hardware?

No, that's not what I want at all.  That's just another band-aid to
avoid a problem without understanding what the problem is.

My point is that apparently we remove a Root Port (which means we've
already removed any devices under it), and then we try to use a device
below the Root Port.  That seems broken.  I want to understand why we
try to use a device after we've removed it.

If the scenario ends up being legitimate and unavoidable, fine -- we
can figure out a quirk to work around the fact the LS7A doesn't allow
that access after we clear Bus Master Enable.  But right now the
scenario smells like a latent bug, and leaving bus mastering enabled 
just avoids it without fixing it.

Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux