On 25/01/2025 08:39, Lukas Wunner wrote:
Masking errors at the register level feels overzealous,
in particular because it also disables logging via tracepoints.
Is there a concrete device that necessitates this change?
I faced issues with excessive Correctable Errors reporting with Samsung
PM1733 NVMe (a couple of thousand errors per hour), which were still
polluting the logs even after introducing a ratelimit (first every 2s,
second ever 30s, as proposed in [1]). Also, instead of masking the
errors automatically, we could give a user a sysfs knob to turn error
generation off and on.
If there is, consider adding a quirk for this particular device
which masks specific errors, but doesn't affect other devices.
There were many other reports of Correctable Error floods, as signaled
in the cover letter, so it's hard to pinpoint the specific driver that
should mask these errors.
All the best,
Karolina
-------------------------------------
[1] -
https://lore.kernel.org/linux-pci/cover.1736341506.git.karolina.stolarek@xxxxxxxxxx/
If there isn't, consider dropping this change until a buggy device
appears that actually needs it.
Thanks,
Lukas