RE: [PATCH] pci, Add AER_panic sysfs file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Prarit Bhargava [mailto:prarit@xxxxxxxxxx]
> Sent: Thursday, May 17, 2012 1:39 PM
> To: Iyer, Shyam
> Cc: linux-pci@xxxxxxxxxxxxxxx; bhelgaas@xxxxxxxxxx
> Subject: Re: [PATCH] pci, Add AER_panic sysfs file
> 
> 
> 
> On 05/17/2012 01:29 PM, Shyam_Iyer@xxxxxxxx wrote:
> >
> >
> >> -----Original Message-----
> >> From: linux-pci-owner@xxxxxxxxxxxxxxx [mailto:linux-pci-
> >> owner@xxxxxxxxxxxxxxx] On Behalf Of Prarit Bhargava
> >> Sent: Thursday, May 17, 2012 1:05 PM
> >> To: linux-pci@xxxxxxxxxxxxxxx
> >> Cc: Prarit Bhargava; Bjorn Helgaas
> >> Subject: [PATCH] pci, Add AER_panic sysfs file
> >>
> >> Consider the following case
> >>
> >> 		[ RP ]
> >> 		  |
> >> 		  |
> >> 	+---------+-----------+
> >> 	|	  |	      |
> >>        [H1]      [H2]        [X1]
> >>
> >> where RP is a PCIE Root Port, H1 and H2 are devices with drivers
> that
> >> support
> >> PCIE AER driver error handling (ie, they have pci_error_handlers
> >> defined in
> >> the driver), and X1 is a device with a driver that does not support
> >> PCIE
> >> AER driver error handling.
> >>
> >> If the Root Port takes an error what currently happens is that the
> >> bus resets and H1 & H2 call their slot_reset functions.  X1 does
> >> nothing.
> >>
> >> In some cases a user may not wish the system to continue because X1
> is
> >> an unhardened driver.  In these cases, the system should not do a
> bus
> >> reset,
> >> but rather the system should panic to avoid any further possible
> data
> >> corruption.
> >
> > Do we neeed to panic for both correctable and uncorrectable errors..
> ?
> >
> > I thought correctable errors could recover without a bus reset.
> 
> Will a bus reset be issued on a correctable error?  I thought the code
> path was
> that the bus reset was issued on the uncorrectable error.
> 
> drivers/pci/pcie/aer/aerdrv_core.c: do_recovery()
> 
>         if (severity == AER_FATAL) {
>                 result = reset_link(dev);
>                 if (result != PCI_ERS_RESULT_RECOVERED)
>                         goto failed;
>         }
> 
> I may not be looking at the right spot of code.  Care to enlighten me?
> :)
> 
> P.

Actually I was reading the documentation .. 
Documentation/PCI/pcieaer-howto.txt

"
Correctable errors pose no impacts on the functionality of the
interface. The PCI Express protocol can recover without any software
intervention or any loss of data. These errors are detected and
corrected by hardware. Unlike correctable errors, uncorrectable
errors impact functionality of the interface. Uncorrectable errors
can cause a particular transaction or a particular PCI Express link
to be unreliable. Depending on those error conditions, uncorrectable
errors are further classified into non-fatal errors and fatal errors.
Non-fatal errors cause the particular transaction to be unreliable,
but the PCI Express link itself is fully functional. Fatal errors, on
the other hand, cause the link to be unreliable.
"

But anyways the AER_FATAL is true for uncorrectable errors only and not for correctable errors which means reset_link doesn't happen for correctable errors.

drivers/pci/pcie/aer/aerdrv_core.c

if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
                e_info->id = ERR_UNCOR_ID(e_src->id);
        
                if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
                        e_info->severity = AER_FATAL;
                else
                        e_info->severity = AER_NONFATAL;
                
                if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV)
                        e_info->multi_error_valid = 1;
                else    
                        e_info->multi_error_valid = 0;
                        
                aer_print_port_info(p_device->port, e_info);
        
                if (find_source_device(p_device->port, e_info))
                        aer_process_err_devices(p_device, e_info);
        }               
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux