On Wed, 2023-12-06 at 10:30 -0600, Bjorn Helgaas wrote: > [+cc Grant, Rajat, Rajat] > > On Wed, Dec 06, 2023 at 10:18:56AM +0800, Kai-Heng Feng wrote: > > On Wed, Nov 15, 2023 at 5:00 AM Nirmal Patel < > > nirmal.patel@xxxxxxxxxxxxxxx> wrote: > > > On Wed, 2023-11-08 at 16:49 +0200, Kai-Heng Feng wrote: > > > > On Wed, Nov 8, 2023 at 12:30 AM Bjorn Helgaas < > > > > helgaas@xxxxxxxxxx> wrote: > > ... > > > > > I assume you mean to revert 04b12ef163d1 ("PCI: vmd: Honor > > > > > ACPI _OSC on PCIe features"). That appeared in v5.17, and it > > > > > fixed (or at least prevented) an AER message flood. We can't > > > > > simply revert 04b12ef163d1 unless we first prevent that AER > > > > > message flood in another way. > > > > > > > > The error is "correctable". Does masking all correctable AER > > > > error by default make any sense? And add a sysfs knob to make > > > > it > > > > optional. > > > > > > I assume sysfs knob requires driver reload. right? Can you send a > > > patch? > > > > What I mean is to mask Correctable Errors by default on *all* > > rootports, and create a new sysfs knob to let user decide if > > Correctable Errors should be unmasked. > > I don't think we should mask Correctable Errors by default. Even > though they've been corrected by hardware and no software action is > required, I think these errors are valuable signals about Link > integrity. > > I think rate-limiting and/or reporting on the *frequency* of > Correctable Errors would make a lot of sense. We had some work > toward > this recently, but it hasn't quite gotten finished yet. > > The most recent work I'm aware of is this: > https://lore.kernel.org/r/20230606035442.2886343-1-grundler@xxxxxxxxxxxx Hi Kai-Heng, Bjorn, I believe the rate limit will not alone fix the issue rather will act as a work around. Without 04b12ef163d1, the VMD driver is not aware of OS native AER support setting, then we will see AER flooding issue which is a bug in VMD driver since it will always enable the AER. There is a setting in BIOS that allows us to enable OS native AER support on the platform. This setting is located in EDK Menu -> Platform configuration -> system event log -> IIO error enabling -> OS native AER support. I have verified that the above BIOS setting alters the native AER flag of _OSC. We can also verify it on Kai-Heng's system. I believe instead of going in the direction of rate limit, vmd driver should honor OS native AER support setting. Do you have any suggestion on this? nirmal > > Bjorn