[+cc Grant, Rajat, Rajat] On Wed, Dec 06, 2023 at 10:18:56AM +0800, Kai-Heng Feng wrote: > On Wed, Nov 15, 2023 at 5:00 AM Nirmal Patel <nirmal.patel@xxxxxxxxxxxxxxx> wrote: > > On Wed, 2023-11-08 at 16:49 +0200, Kai-Heng Feng wrote: > > > On Wed, Nov 8, 2023 at 12:30 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > ... > > > > I assume you mean to revert 04b12ef163d1 ("PCI: vmd: Honor > > > > ACPI _OSC on PCIe features"). That appeared in v5.17, and it > > > > fixed (or at least prevented) an AER message flood. We can't > > > > simply revert 04b12ef163d1 unless we first prevent that AER > > > > message flood in another way. > > > > > > The error is "correctable". Does masking all correctable AER > > > error by default make any sense? And add a sysfs knob to make it > > > optional. > > > > I assume sysfs knob requires driver reload. right? Can you send a > > patch? > > What I mean is to mask Correctable Errors by default on *all* > rootports, and create a new sysfs knob to let user decide if > Correctable Errors should be unmasked. I don't think we should mask Correctable Errors by default. Even though they've been corrected by hardware and no software action is required, I think these errors are valuable signals about Link integrity. I think rate-limiting and/or reporting on the *frequency* of Correctable Errors would make a lot of sense. We had some work toward this recently, but it hasn't quite gotten finished yet. The most recent work I'm aware of is this: https://lore.kernel.org/r/20230606035442.2886343-1-grundler@xxxxxxxxxxxx Bjorn