A monitoring dashboard caught my attention when it displayed weird spikes in computed interrupt rates. A machine with constant network load shows about ~250k interrupts per second, then every 4-5 hours there's a one-off spike to 11 billions. Turns out, if you plot the interrupt counter, you get a graph like this: ###. ### | ####. ####### #####. ##### ## | ######### ##### | ## ###### # #### #### # ### While monitoring tools are typically used to handling counter wrap-arounds, they may not be ready to handle dips like this. What is the impact ------------------ Not much, actually. The counters always decrement by exactly 2^32 (which is suggestive), so if you mask out the high bits of the counter and consider only the low 32 bits, then the value sequence actually make sense, given an appropriate sampling rate. However, if you don't mask out the value and assume it to be accurate -- well, that assumption is incorrect. Interrupt sums might look correct and contain some big number, but it could be arbitrarily distant from the actual number of interrupts serviced since the boot time. This concerns only the total value of "intr" and "softirq" rows: intr 14390913189 32 11 0 0 238 0 0 0 0 0 0 0 88 0 [...] softirq 14625063745 0 596000256 300149 272619841 0 0 [...] ^^^^^^^^^^^ these ones Why this happens ---------------- The reason for such behaviour is that the "total" interrupt counters presented by /proc/stat are actually computed by adding up per-interrupt per-CPU counters. Most of these are "unsigned int", while some of them are "unsigned long", and the accumulator is "u64". What a mess... Individual counters are monotonically increasing (modulo wrapping), however if you add multiple values with different bit widths then the sum is *not* guaranteed to be monotonically increasing. What can be done ---------------- 1. Do nothing. Userspace can trivially compensate for this 'curious' behavior by masking out the high bits, observing only the low sizeof(unsigned) part, and taking care to handle wrap-arounds. This maintains status quo, but the "issue" of interrupt sums not being quite accurate remains. 2. Change the presentation type to the lowest denominator. That is, unsigned int. Make the kernel mask out not-quite-accurate bits from the value it reports. Keep it that way until every underlying counter type is changed to something wider. The benefit here is that users that *are* ready to handle proper wrap-arounds will be able to handle them automagically without undocumented hacks (see option 1). This changes the observed value and will cause "unexpected" wrap-arounds to happen earlier in some use-cases, which might upset users that are not ready to handle them, or don't want to poll /proc/stat more frequently. It's debatable what's better: a lower-width value that might need to be polled more often, or a wider-width value that is not completely accurate. 3. Change the interrupt counter types to be wider. A different take on the issue: instead of narrowing the presentation from faux-u64 to unsigned it, widen the interrupt counters from unsigned int to... something else: - u64 interrupt counters are 64-bit everywhere, period - unsigned long interrupt counters are 64-bit if the platform thinks that "long" is longer than "int" Whatever the type is used, it must be the same for all interrupt counters across the kernel as well as the type used to compute and display the sum of all these counters by /proc/stat. The advantage here is that 64-bit counters will be probably enough for *anything* to not overflow anytime soon before the heat death of the universe, thus making the wrap-around problem irrelevant. The disadvantage here is that some hardware counters are 32-bit, and you can't make them wider. Some platforms also don't have proper atomic support for 64-bit integers, making wider counters problematic to implement efficiently. So what do we do? ----------------- I suggest to wrap interrupt counter sum at "unsigned int", the same type used for (most) individual counters. That makes for the most predictable behavior. I have a patch set cooking that does this. Will this be of any interest? Or do you think changing the behavior of /proc/stat will cause more trouble than merit? Prior discussion ---------------- This question is by no means new, it has been discussed several times: 2019 - genirq, proc: Speedup /proc/stat interrupt statistics The issue of overflow and wrap-around has been touched upon, suggesting that userspace should just deal with it. The issue of using u64 for the sum has been brought up too, but it did not go anywhere. https://lore.kernel.org/all/20190208143255.9dec696b15f03bf00f4c60c2@xxxxxxxxxxxxxxxxxxxx/ https://lore.kernel.org/all/3460540b50784dca813a57ddbbd41656@xxxxxxxxxxxxxxxx/ 2014 - Why do we still have 32 bit counters? Interrupt counters overflow within 50 days Discussion on whether it's appropriate to bump counter width to 64 bits in order to avoid the overflow issues entirely. https://lore.kernel.org/lkml/alpine.DEB.2.11.1410030435260.8324@xxxxxxxxxx/