Re: [Patch]PCI:AER:Notify which device has no error_detected callback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 16 Dec 2019 11:58:32 -0600
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> [+cc Alex]
> 
> On Mon, Dec 16, 2019 at 06:30:11PM +0800, Yicong Yang wrote:
> > On 2019/12/14 6:57, Bjorn Helgaas wrote:  
> > > On Fri, Dec 13, 2019 at 07:44:34PM +0800, Yicong Yang wrote:  
> > >> The PCI error recovery will fail if any device under
> > >> root port doesn't have an error_detected callback.
> > >> Currently only failure result is printed, which is
> > >> not enough to determine which device leads to the
> > >> failure and the detailed failure reason.
> > >>
> > >> Add print information if certain device under root
> > >> port has no error_detected callback.
> > >>
> > >> Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>  
> > > Applied to pci/aer for v5.6, thanks!
> > >
> > > I also added a trivial follow-on patch to factor out the "AER: "
> > > prefix (attached below).  This code is now used by DPC as well as AER,
> > > so "AER: " might not be exactly the correct prefix, but I didn't try
> > > to untangle that.
> > >  
> > >> ---
> > >>  drivers/pci/pcie/err.c | 4 +++-
> > >>  1 file changed, 3 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > >> index b0e6048..ec37c33 100644
> > >> --- a/drivers/pci/pcie/err.c
> > >> +++ b/drivers/pci/pcie/err.c
> > >> @@ -61,8 +61,10 @@ static int report_error_detected(struct pci_dev *dev,
> > >>  		 * error callbacks of "any" device in the subtree, and will
> > >>  		 * exit in the disconnected error state.
> > >>  		 */
> > >> -		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
> > >> +		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
> > >>  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> > >> +			pci_info(dev, "AER: Device has no error_detected callback\n");
> > >> +		}
> > >>  		else
> > >>  			vote = PCI_ERS_RESULT_NONE;
> > >>  	} else {
> > >> --
> > >> 2.8.1
> > >>  
> > > commit 9694ef043ea4 ("PCI/AER: Factor message prefixes with dev_fmt()")
> > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > > Date:   Fri Dec 13 16:46:05 2019 -0600
> > >
> > >     PCI/AER: Factor message prefixes with dev_fmt()
> > >     
> > >     Define dev_fmt() with the common prefix of log messages so we don't have to
> > >     repeat it in every printk.  No functional change intended.
> > >     
> > >     Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > >
> > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > index abde5e000f5d..747ef8b41d1f 100644
> > > --- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > > @@ -10,6 +10,8 @@
> > >   *	Zhang Yanmin (yanmin.zhang@xxxxxxxxx)
> > >   */
> > >  
> > > +#define dev_fmt(fmt) "AER: " fmt
> > > +
> > >  #include <linux/pci.h>
> > >  #include <linux/module.h>
> > >  #include <linux/kernel.h>
> > > @@ -64,7 +66,7 @@ static int report_error_detected(struct pci_dev *dev,
> > >  		 */
> > >  		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
> > >  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> > > -			pci_info(dev, "AER: Driver has no error_detected callback\n");
> > > +			pci_info(dev, "driver has no error_detected callback\n");  
> > 
> > I think use "device" here is more proper. Sometimes the device is
> > even not bind to the driver, which will also lead to the recovery
> > failure.  
> 
> Hmmm, maybe so, although the error_detected callback is definitely a
> property of the *driver*, not the device.  What would you think of
> something like this?
> 
>   pci_info(dev, dev->driver ? "driver not capable of error recovery\n" :
>                               "no driver\n");
> 
> I am curious about the larger question of why we fail if the device
> has no driver.  I understand why we fail the recovery for a device
> with a driver that has no err_handlers: we can't just reset the device
> out from under the driver.
> 
> But why do we fail if a device has no driver at all?  Shouldn't it be
> safe to reset such a device?  The commit log of 918b4053184c
> ("PCI/AER: Report success only when every device has AER-aware
> driver") suggests that it has something to do with KVM and the
> pci-stub driver, but I don't understand it.

That commit is from 2012, when legacy KVM device assignment existed in
the kernel and was able to drive a device without really binding to it
as a driver.  Sane users of this mechanism would at least bind the
device to pci-stub to prevent it from being used by both a host driver
and KVM simultaneously.  In any case, legacy KVM device assignment no
longer exists in the kernel, so if that was the justification to not
reset driver-less devices, we can probably fill that gap now.  Thanks,

Alex




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux