On 20/10, Bjorn Helgaas wrote: > [+cc Keith, Sinan, Oza] > > On Tue, Oct 05, 2021 at 10:48:12PM +0530, Naveen Naidu wrote: > > In the EDR path, AER registers are cleared *after* DPC error event is > > processed. The process stack in EDR is: > > > > edr_handle_event() > > dpc_process_error() > > pci_aer_raw_clear_status() > > pcie_do_recovery() > > > > But in DPC path, AER status registers are cleared *while* processing > > the error. The process stack in DPC is: > > > > dpc_handler() > > dpc_process_error() > > pci_aer_clear_status() > > pcie_do_recovery() > > These are accurate but they both include dpc_process_error(), so we > need a hint to show why the one here is different from the one in the > EDR path, e.g., > > dpc_handler > dpc_process_error > if (reason == 0) > pci_aer_clear_status # uncorrectable errors only > pcie_do_recovery > > > In EDR path, AER status registers are cleared irrespective of whether > > the error was an RP PIO or unmasked uncorrectable error. But in DPC, the > > AER status registers are cleared only when it's an unmasked uncorrectable > > error. > > > > This leads to two different behaviours for the same task (handling of > > DPC errors) in FFS systems and when native OS has control. > > FFS? > Firmware First Systems > I'd really like to have a specific example of how a user would observe > this difference. I know you probably don't have two systems to > compare like that, but maybe we can work it out manually. > Apologies again! Reading through the code again and the specification, I realize that my understanding was very incorrect at the time of making this patch. I grossly oversimplified EDR and DPC when I was learning about it. I'll drop this patch when I send the v5 for the series. Apologies again ^^' > I guess you're saying the problem is in the native DPC handling, and > we don't clear the AER status registers for ERR_NONFATAL, > ERR_NONFATAL, etc., right? > But yes, I did have this question though (I wasn't able to find the answers to it when reading the spec). Why do we not clear the entire ERR_NONFATAL and ERR_FATAL registers in the DPC path just like EDR does using the pci_aer_raw_clear_status() before going to pcie_do_recovery() I am sure I might have missed something in the spec. I guess I'll look/re-read these bits again. Thanks for the review :) > I think the current behavior is from 8aefa9b0d910 ("PCI/DPC: Print AER > status in DPC event handling"), where Keith explicitly mentions those > cases. The commit log here should connect back to that and explain > whether something has changed. > > I cc'd Keith and the reviewers of that change in case any of them have > time to dig into this again. > > > Bring the same semantics for clearing the AER status register in EDR > > path and DPC path. > > > > Signed-off-by: Naveen Naidu <naveennaidu479@xxxxxxxxx> > > --- > > drivers/pci/pcie/dpc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > > index faf4a1e77fab..68899a3db126 100644 > > --- a/drivers/pci/pcie/dpc.c > > +++ b/drivers/pci/pcie/dpc.c > > @@ -288,7 +288,6 @@ void dpc_process_error(struct pci_dev *pdev) > > dpc_get_aer_uncorrect_severity(pdev, &info) && > > aer_get_device_error_info(pdev, &info)) { > > aer_print_error(pdev, &info); > > - pci_aer_clear_status(pdev); > > } > > } > > > > @@ -297,6 +296,7 @@ static irqreturn_t dpc_handler(int irq, void *context) > > struct pci_dev *pdev = context; > > > > dpc_process_error(pdev); > > + pci_aer_clear_status(pdev); > > > > /* We configure DPC so it only triggers on ERR_FATAL */ > > pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link); > > -- > > 2.25.1 > > > > _______________________________________________ > > Linux-kernel-mentees mailing list > > Linux-kernel-mentees@xxxxxxxxxxxxxxxxxxxxxxxxx > > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees