Re: pci_error_handlers for pci_host_bridge ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bjorn,
Understood;
Thanks for the insights.

For now we will fix with ISR patch to root bridge driver locally till
we get a proper Hardware fix.

Thanks.


On Fri, Jun 8, 2018 at 6:06 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> On Fri, Jun 08, 2018 at 05:27:12PM +0530, Subrahmanya Lingappa wrote:
>> Bjorn,
>>
>> On Thu, Jun 7, 2018 at 7:29 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>> > On Thu, Jun 07, 2018 at 04:20:10PM +0530, poza@xxxxxxxxxxxxxx wrote:
>> >> On 2018-06-07 15:45, Subrahmanya Lingappa wrote:
>> >> > Bjorn,
>> >> >
>> >> > On Wed, Jun 6, 2018 at 6:20 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx>
>> >> > wrote:
>> >> > > Hi Subrahmanya,
>> >> > >
>> >> > > On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
>> >> > > > Hi,
>> >> > > > according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
>> >> > > >
>> >> > > > as part of AER handling, struct pci_error_handlers{} is
>> >> > > > implemented by
>> >> > > > endpoint drivers to handle device specific recovery steps for "struct
>> >> > > > pci_driver".
>> >> > > >
>> >> > > > But we have a platform_driver which implements "struct
>> >> > > > pci_host_bridge" which also supports AER capability how can we
>> >> > > > support
>> >> > > > pci_error_handlers() for the host bridge drivers ?
>> >> > >
>> >> > > I assume you're referring to Mobiveil.  Can you explain more of the
>> >> > > topology here?  Can you also include "sudo lspci -vv" output?
>> >> > >
>> >> > Yes, it is for Mobiveil's Host bridge driver :
>> >> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/host/pcie-mobiveil.c
>> >> > lspci output is now is not available, I'll try to get it sooner.
>> >> >
>> >> > we have an endpoint connected directly to Rootport as follows
>> >> > RC Rootport ----->BUS------> EP
>> >> >
>> >> > > The AER capability is an optional capability of PCIe device functions.
>> >> > > A host bridge is not itself a PCIe function; it's a bridge between a
>> >> > > platform-specific host bus and the PCIe bus.
>> >> > >
>> >> > > Sometimes there is a PCI function that corresponds to the host bridge,
>> >> > > but that's not required by the PCI specs and there is no generic
>> >> > > programming model for it.
>> >> > >
>> >> > > If you have an PCIe function corresponding to the Mobiveil host
>> >> > > bridge, and it has an AER capability, what would you want the error
>> >> > > handlers to do?  This function would not normally be a Root Port or
>> >> > > other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
>> >> > > would integrate with the PCIe hierarchy.
>> >> > >
>> >> > Yes we do have a PCI function with AER capability, after an AER
>> >> > reported by EP, AER driver initiates an hot_reset on subordinate
>> >> > bus, which happens to be downstream port for RC. So we get a
>> >> > downstream port link down happens in this case RC driver needs to
>> >> > follow a specific register restore sequence, which is most of the
>> >> > HW specific initialization done in probe function of the driver
>> >> > to recover properly.
>> >
>> > I'm still not quite getting your point.  Let's try a concrete example.
>> > Assume this topology (this is what you described above):
>> >
>> >   00:00.0 Root Port to [bus 01]
>> >   01:00.0 Endpoint
>> >
>> > Then we have this situation:
>> >
>> >   - AER driver binds to Root Port 00:00.0
>> >   - Endpoint 01:00.0 detects an error and sends ERR_* message upstream
>> >   - Root Port 00:00.0 receives ERR_* message
>> >   - Root Port 00:00.0 generates AER interrupt
>> >   - AER driver handles the interrupt and does a secondary bus reset on
>> >     Root Port 00:00.0.  This resets the Endpoint (01:00.0) but not the
>> >     Root Port itself.
>> >
>> > It sounds like you need to do an RC register restore sequence at this
>> > point, i.e., something like mobiveil_host_init()?  That sounds like a
>> > hardware defect.  Doing a secondary bus reset via the Root Port's
>> > bridge control register should have no effect on the RC itself.
>> >
>> > What am I missing?
>> Yes, in this situation PCI config registers are fine, but due to a bug
>> RC, config space registers will get reset
>> in the SW we need to call the mobiveil_host_init() again to recover in
>> case of directly connected EP to the Root-port.
>>
>> We do get an local interrupt when this problem occurs, so this can be
>> handled in Driver's ISR;
>> or is there a better way to handle it ?
>
> I don't see a better way to handle it.
>
> If the PCI core were doing something wrong here, e.g., missing some
> restore required by hardware that follows the spec, we would of course
> want a nice way to fix it in the core.
>
> But this sounds like hardware that does not follow the spec, so the
> workaround should be in the hardware-specific driver.
>
> Note that this may be an issue if/when you want to support ACPI with
> this hardware, because in that case we use the generic ACPI host
> bridge driver (drivers/acpi/pci_root.c) and there is less opportunity
> for device-specific workarounds like this.
>
> Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux