[+cc Jon, Karolina] On Wed, Jan 08, 2025 at 03:57:03PM +0800, Bijie Xu wrote: > Sometimes certain PCIE devices installed on some servers occasionally > produce large number of AER correctable error logs, which is quite > annoying. Add this sysctl parameter kernel.aer_print_skip_mask to > skip printing AER errors of certain severity. > > The AER severity can be 0(NONFATAL), 1(FATAL), 2(CORRECTABLE). The 3 > low bits of the mask are used to skip these 3 severities. Set bit 0 > can skip printing NONFATAL AER errors, and set bit 1 can skip printing > FATAL AER errors, set bit 2 can skip printing CORRECTABLE AER errors. > And multiple bits can be set to skip multiple severities. This is definitely annoying, actually MORE than annoying in some cases. I'm hoping the correctable error rate-limiting work can reduce the annoyance to an tolerable level: https://lore.kernel.org/r/20250214023543.992372-1-pandoh@xxxxxxxxxx Can you take a look at this and see if it's going the right direction for you, or if it needs extensions to do what you need? > Signed-off-by: Bijie Xu <bijie.xu@xxxxxxxxxxxx> > --- > drivers/pci/pcie/aer.c | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 80c5ba8d8296..b46973526bcf 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -698,6 +698,7 @@ static void __aer_print_error(struct pci_dev *dev, > pci_dev_aer_stats_incr(dev, info); > } > > +unsigned int aer_print_skip_mask __read_mostly; > void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > { > int layer, agent; > @@ -710,6 +711,9 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > goto out; > } > > + if ((1 << info->severity) & aer_print_skip_mask) > + goto out; > + > layer = AER_GET_LAYER_ERROR(info->severity, info->status); > agent = AER_GET_AGENT(info->severity, info->status); > > @@ -1596,3 +1600,22 @@ int __init pcie_aer_init(void) > return -ENXIO; > return pcie_port_service_register(&aerdriver); > } > + > +static const struct ctl_table aer_print_skip_mask_sysctls[] = { > + { > + .procname = "aer_print_skip_mask", > + .data = &aer_print_skip_mask, > + .maxlen = sizeof(unsigned int), > + .mode = 0644, > + .proc_handler = &proc_douintvec, > + }, > + {} > +}; > + > +static int __init aer_print_skip_mask_sysctl_init(void) > +{ > + register_sysctl_init("kernel", aer_print_skip_mask_sysctls); > + return 0; > +} > + > +late_initcall(aer_print_skip_mask_sysctl_init); > -- > 2.25.1 >