On Wed, Jan 09, 2019 at 01:54:36PM -0500, Waiman Long wrote: > If you read patch 4, you can see that quite a bit of CPU cycles was > spent looking up the radix tree to locate the IRQ descriptor for each of > the interrupts. Those overhead will still be there even if I use percpu > counters. So using percpu counter alone won't be as performant as this > patch or my previous v1 patch. Hm, if that's the overhead, then the radix tree (and the XArray) have APIs that can reduce that overhead. Right now, there's only one caller of kstat_irqs_usr() (the proc code). If we change that to fill an array instead of returning a single value, it can look something like this: void kstat_irqs_usr(unsigned int *sums) { XA_STATE(xas, &irq_descs, 0); struct irq_desc *desc; xas_for_each(&xas, desc, ULONG_MAX) { unsigned int sum = 0; if (!desc->kstat_irqs) continue; for_each_possible_cpu(cpu) sum += *per_cpu_ptr(desc->kstat_irqs, cpu); sums[xas->xa_index] = sum; } }