RE: [PATCH] pci, Add AER_panic sysfs file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Iyer, Shyam
> Sent: Thursday, May 17, 2012 1:52 PM
> To: 'Prarit Bhargava'
> Cc: 'linux-pci@xxxxxxxxxxxxxxx'; 'bhelgaas@xxxxxxxxxx'
> Subject: RE: [PATCH] pci, Add AER_panic sysfs file
> 
> 
> 
> > -----Original Message-----
> > From: Prarit Bhargava [mailto:prarit@xxxxxxxxxx]
> > Sent: Thursday, May 17, 2012 1:39 PM
> > To: Iyer, Shyam
> > Cc: linux-pci@xxxxxxxxxxxxxxx; bhelgaas@xxxxxxxxxx
> > Subject: Re: [PATCH] pci, Add AER_panic sysfs file
> >
> >
> >
> > On 05/17/2012 01:29 PM, Shyam_Iyer@xxxxxxxx wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: linux-pci-owner@xxxxxxxxxxxxxxx [mailto:linux-pci-
> > >> owner@xxxxxxxxxxxxxxx] On Behalf Of Prarit Bhargava
> > >> Sent: Thursday, May 17, 2012 1:05 PM
> > >> To: linux-pci@xxxxxxxxxxxxxxx
> > >> Cc: Prarit Bhargava; Bjorn Helgaas
> > >> Subject: [PATCH] pci, Add AER_panic sysfs file
> > >>
> > >> Consider the following case
> > >>
> > >> 		[ RP ]
> > >> 		  |
> > >> 		  |
> > >> 	+---------+-----------+
> > >> 	|	  |	      |
> > >>        [H1]      [H2]        [X1]
> > >>
> > >> where RP is a PCIE Root Port, H1 and H2 are devices with drivers
> > that
> > >> support
> > >> PCIE AER driver error handling (ie, they have pci_error_handlers
> > >> defined in
> > >> the driver), and X1 is a device with a driver that does not
> support
> > >> PCIE
> > >> AER driver error handling.
> > >>
> > >> If the Root Port takes an error what currently happens is that the
> > >> bus resets and H1 & H2 call their slot_reset functions.  X1 does
> > >> nothing.
> > >>
> > >> In some cases a user may not wish the system to continue because
> X1
> > is
> > >> an unhardened driver.  In these cases, the system should not do a
> > bus
> > >> reset,
> > >> but rather the system should panic to avoid any further possible
> > data
> > >> corruption.
> > >
> > > Do we neeed to panic for both correctable and uncorrectable
> errors..
> > ?
> > >
> > > I thought correctable errors could recover without a bus reset.
> >
> > Will a bus reset be issued on a correctable error?  I thought the
> code
> > path was
> > that the bus reset was issued on the uncorrectable error.
> >
> > drivers/pci/pcie/aer/aerdrv_core.c: do_recovery()
> >
> >         if (severity == AER_FATAL) {
> >                 result = reset_link(dev);
> >                 if (result != PCI_ERS_RESULT_RECOVERED)
> >                         goto failed;
> >         }
> >
> > I may not be looking at the right spot of code.  Care to enlighten
> me?
> > :)
> >
> > P.
> 
> Actually I was reading the documentation ..
> Documentation/PCI/pcieaer-howto.txt
> 
> "
> Correctable errors pose no impacts on the functionality of the
> interface. The PCI Express protocol can recover without any software
> intervention or any loss of data. These errors are detected and
> corrected by hardware. Unlike correctable errors, uncorrectable
> errors impact functionality of the interface. Uncorrectable errors
> can cause a particular transaction or a particular PCI Express link
> to be unreliable. Depending on those error conditions, uncorrectable
> errors are further classified into non-fatal errors and fatal errors.
> Non-fatal errors cause the particular transaction to be unreliable,
> but the PCI Express link itself is fully functional. Fatal errors, on
> the other hand, cause the link to be unreliable.
> "
> 
> But anyways the AER_FATAL is true for uncorrectable errors only and not
> for correctable errors which means reset_link doesn't happen for
> correctable errors.
> 
> drivers/pci/pcie/aer/aerdrv_core.c
> 
> if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
>                 e_info->id = ERR_UNCOR_ID(e_src->id);
> 
>                 if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
>                         e_info->severity = AER_FATAL;
>                 else
>                         e_info->severity = AER_NONFATAL;
> 
>                 if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV)
>                         e_info->multi_error_valid = 1;
>                 else
>                         e_info->multi_error_valid = 0;
> 
>                 aer_print_port_info(p_device->port, e_info);
> 
>                 if (find_source_device(p_device->port, e_info))
>                         aer_process_err_devices(p_device, e_info);
>         }



Looks like we are saying the same thing and I just misunderstood that you were doing a panic for each error. 

The patch looks good to me if it matters.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux