Re: [PATCH 5/8] PCI/AER: Introduce ratelimit for AER IRQs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 06, 2025 at 02:56:20PM +0100, Karolina Stolarek wrote:
> On 25/01/2025 08:39, Lukas Wunner wrote:
> > Masking errors at the register level feels overzealous,
> > in particular because it also disables logging via tracepoints.
> > 
> > Is there a concrete device that necessitates this change?
> 
> I faced issues with excessive Correctable Errors reporting with Samsung
> PM1733 NVMe (a couple of thousand errors per hour), which were still
> polluting the logs even after introducing a ratelimit

I'd suggest to add a "u32 aer_cor_mask" to "struct pci_dev" in the
"#ifdef CONFIG_PCIEAER" section.

Then add a "DECLARE_PCI_FIXUP_HEADER()" macro in drivers/pci/quirks.c
for the Samsung PM1733 which calls a new function which sets exactly the
error bits you're seeing to aer_cor_mask.  This should be #ifdef'ed to
CONFIG_PCIEAER as well.

Finally, amend aer.c to set the bits in aer_cor_mask in the
PCI_ERR_COR_MASK register on probe.


> > If there is, consider adding a quirk for this particular device
> > which masks specific errors, but doesn't affect other devices.
> 
> There were many other reports of Correctable Error floods, as signaled in
> the cover letter, so it's hard to pinpoint the specific driver that should
> mask these errors.

If a specific device frequently signals the same errors,
I think that's a bug of that device and if the vendor doesn't
provide a firmware update, quiescing the errors through a quirk
is a plausible solution.

Of course if this is widespread, it becomes a maintenance nightmare
and then the quirk approach is not a viable option.  I cannot say
whether that's the case.  So far there's a report for one specific
product (the Samsung drive) and hinting that the problem may be
widespread.  It's difficult to make a recommendation without
precise data.

Thanks,

Lukas




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux