On Fri, Jan 31, 2025 at 6:44 AM Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > > On Sat, 25 Jan 2025 08:39:35 +0100 > Lukas Wunner <lukas@xxxxxxxxx> wrote: > > Masking errors at the register level feels overzealous, > > in particular because it also disables logging via tracepoints. > > > > Is there a concrete device that necessitates this change? > > If there is, consider adding a quirk for this particular device > > which masks specific errors, but doesn't affect other devices. > > If there isn't, consider dropping this change until a buggy device > > appears that actually needs it. > > Fully agree with this comment. At very least this should default > to not ratelimiting on the tracepoints unless a specific opt in has > occurred (probably a platform or device driver quirk). Hi Lukas and Jonathan, Thanks for the comments. Since IRQ ratelimiting/masking is more drastic, it requires more nuance/thought (split the series in v2[1] as a result). I am not targeting specific devices per say. The intent is to allow userspace daemons/agents to dynamically collect telemetry/handle spam. In the context of the datacenter (i.e. several OCP members), there are many deployments of new HW/configurations where we may see/have seen error spam when trying to enable native AER. Kernel quirks work in the medium term (until the underlying device is fixed), but require a kernel rollout. There is a desire to address this faster (i.e. without rollout/reinstall) and I think IRQ ratelimiting fits the requirements. I like the idea of having IRQ ratelimiting off as default though as it is a big change. > In particular I'd worry that you are masking whatever errors are > finally trigger masking. That might be the only one of that > particular type that was seen and I think the only report we > see is the 'I masked it message'. So rasdaemon for example > never sees the error at all. So another tweak would be report > one last time so we definitely see any given error type at least > once. Ack. [1] https://lore.kernel.org/linux-pci/20250214023543.992372-1-pandoh@xxxxxxxxxx/ Thanks, Jon