v1: https://lkml.org/lkml/2019/1/7/899 v2: Fix a minor bug in patch 4 & update the cover-letter. As newer systems have more and more IRQs and CPUs available in their system, the performance of reading /proc/stat frequently is getting worse and worse. It appears that the idea of caching the IRQ counts in the v1 patch to reduce the frequency of doing percpu summation and use a sysctl parameter to control it was not well received. I have looked into the use of percpu counters for counting interrupts. However, the followings are the reasons why I don't think percpu counters is the right choice for doing that. 1) There is a raw spinlock in the percpu_counter structure that may need to be acquired in the update path. This can be a performance drag especially if lockdep is enabled. 2) The percpu_counter structure is 40 bytes in size on 64-bit systems compared with just 8 bytes for the percpu count pointer and an additional 4 bytes that I introduced in patch 2 which may not actually increase the size of the IRQ descriptor. With thousands of irq descriptors, it can consume quite a lot more memory. Memory consumption was a point that had been brought up in the v1 patch review. 3) Reading the patch 4 commit log, one can see that quite a bit of CPU cycles was spent looking up the radix tree to locate the IRQ descriptors for each of the interrupts. Those overhead will still be there even if I use percpu counters. So using percpu counter alone won't be as performant as this patch or my previous v1 patch. Patch 4 optimizes the descriptor lookup process which is independant of the percpu counter choice. 4) Patches 2 and 3 are the patches that modify the percpu counting aspect of the IRQ counts. The number of changed lines of code is only 14. So they are very simple changes. This new patch optimizes the way the IRQ counts are retrieved and getting rid of the sysctl parameter altogether to achieve a performance gain that is close to the v1 patch. This is based on the idea that while many IRQs can be supported by a system, only a handful of them are actually being used in most cases. We can save a lot of time by focusing on those active IRQs only and ignore the rests. Patch 1 is the same as that in v1 while the other 3 patches are new. Waiman Long (4): /proc/stat: Extract irqs counting code into show_stat_irqs() /proc/stat: Only do percpu sum of active IRQs genirq: Track the number of active IRQs /proc/stat: Call kstat_irqs_usr() only for active IRQs fs/proc/stat.c | 123 ++++++++++++++++++++++++++++++++++++++++++++---- include/linux/irqdesc.h | 1 + kernel/irq/internals.h | 6 ++- kernel/irq/irqdesc.c | 7 ++- 4 files changed, 125 insertions(+), 12 deletions(-) -- 1.8.3.1