RE: Having problems resetting a PCI device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn,

> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx]
> Sent: Wednesday, March 29, 2017 4:55 PM
> To: Zytaruk, Kelly
> Cc: linux-pci@xxxxxxxxxxxxxxx; Alex Williamson
> Subject: Re: Having problems resetting a PCI device
> 
> Hi Kelly,
> 
> On Wed, Mar 29, 2017 at 08:03:33PM +0000, Zytaruk, Kelly wrote:
> > I have a PCI device that is sitting behind a bridge.
> >
> > Under certain reproducible circumstances the PCI device will become
> > inactive. Reading the PCI config space returns all 0xFFFFFFFF.
> >
> > The bridge appears to still be functional. Reading the status from the
> > bridge I see a Fatal Error due to a Surprise Down event.
> 
> Just to be specific, is this the "Surprise Down Error" in the AER uncorrectable
> error status register?  "lspci -vv" probably decodes all that for you.

Yes, it shows up in the bridge device PCI config space.

> 
> > I am trying to figure out how to bring the device back online.
> >
> > I tried toggling the secondary bus reset bit of the Bridge Control
> > Register but it doesn't appear to make any difference. I still see
> > 0xFFFFFFFF in the device config space.
> 
> Are you calling pci_reset_function() or doing this by hand?
> pci_reset_function() tries several different strategies, one of which is toggling
> the secondary bus reset bit.

I am doing it by hand.  
I just found the pci_reset_function about 5 minutes ago as I was scanning through pci.c for any clues.

> 
> > I provided a pci_error_handler but the error_detected() function is
> > not getting called.
> 
> Do you have CONFIG_PCIEAER turned on?  I would naively expect AER to log
> something and call your error_detected() function if this error occurs (but I

CONFIG_PCIEAER=y but error_detected() not getting called.

I also noticed that CONFIG_HOTPLUG_PCI_PCIE=y.  How do I trigger a hot unplug, hot plug?
Maybe that might bring it back?

> haven't looked at the code for a long time).
> 
> > Given that these two methods are not helping me out what other choices
> > do I have to either reset the PCI device or hot-plug the device from a
> > kernel driver. Or some other method of bring the device back to life.
> >
> > Note that I am running Linux 4.8 in dom0 on Xen (if that makes a
> > difference).
> >
> > Thanks, Kelly




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux