On Wed, May 17, 2023 at 9:03 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Fri, Apr 07, 2023 at 04:46:03PM -0700, Grant Grundler wrote: > > On Fri, Apr 7, 2023 at 12:46 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > On Fri, Apr 07, 2023 at 11:53:27AM -0700, Grant Grundler wrote: > > > > On Thu, Apr 6, 2023 at 12:50 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> > > > wrote: > > > > > On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote: > > > > > > From: Rajat Khandelwal <rajat.khandelwal@xxxxxxxxxxxxxxx> > > > > > > > > > > > > There are many instances where correctable errors tend to inundate > > > > > > the message buffer. We observe such instances during thunderbolt PCIe > > > > > > tunneling. > > > > ... > > > > > > > > > if (info->severity == AER_CORRECTABLE) > > > > > > - pci_info(dev, " [%2d] %-22s%s\n", i, errmsg, > > > > > > - info->first_error == i ? " (First)" : > > > ""); > > > > > > + pci_info_ratelimited(dev, " [%2d] > > > %-22s%s\n", i, errmsg, > > > > > > + info->first_error == i ? > > > " (First)" : ""); > > > > > > > > > > I don't think this is going to reliably work the way we want. We have > > > > > a bunch of pci_info_ratelimited() calls, and each caller has its own > > > > > ratelimit_state data. Unless we call pci_info_ratelimited() exactly > > > > > the same number of times for each error, the ratelimit counters will > > > > > get out of sync and we'll end up printing fragments from error A mixed > > > > > with fragments from error B. > > > > > > > > Ok - what I'm reading between the lines here is the output should be > > > > emitted in one step, not multiple pci_info_ratelimited() calls. if the > > > > code built an output string (using sprintnf()), and then called > > > > pci_info_ratelimited() exactly once at the bottom, would that be > > > > sufficient? > > > > > > > > > I think we need to explicitly manage the ratelimiting ourselves, > > > > > similar to print_hmi_event_info() or print_extlog_rcd(). Then we can > > > > > have a *single* ratelimit_state, and we can check it once to determine > > > > > whether to log this correctable error. > > > > > > > > Is the rate limiting per call location or per device? From above, I > > > > understood rate limiting is "per call location". If the code only > > > > has one call location, it should achieve the same goal, right? > > > > > > Rate-limiting is per call location, so yes, if we only have one call > > > location, that would solve it. It would also have the nice property > > > that all the output would be atomic so it wouldn't get mixed with > > > other stuff, and it might encourage us to be a little less wordy in > > > the output. > > > > > > > +1 to all of those reasons. Especially reducing the number of lines output. > > > > I'm going to be out for the next week. If someone else (Rajat Kendalwal > > maybe?) wants to rework this to use one call location it should be fairly > > straight forward. If not, I'll tackle this when I'm back (in 2 weeks > > essentially). > > Ping? Really hoping to merge this for v6.5. Sorry - I forgot about this... I'll take a shot at it. Should have something by this evening. cheers, grant > > Bjorn