On 01/09/2019 02:59 PM, Matthew Wilcox wrote: > On Wed, Jan 09, 2019 at 01:54:36PM -0500, Waiman Long wrote: >> If you read patch 4, you can see that quite a bit of CPU cycles was >> spent looking up the radix tree to locate the IRQ descriptor for each of >> the interrupts. Those overhead will still be there even if I use percpu >> counters. So using percpu counter alone won't be as performant as this >> patch or my previous v1 patch. > Hm, if that's the overhead, then the radix tree (and the XArray) have > APIs that can reduce that overhead. Right now, there's only one caller > of kstat_irqs_usr() (the proc code). If we change that to fill an array > instead of returning a single value, it can look something like this: > > void kstat_irqs_usr(unsigned int *sums) > { > XA_STATE(xas, &irq_descs, 0); > struct irq_desc *desc; > > xas_for_each(&xas, desc, ULONG_MAX) { > unsigned int sum = 0; > > if (!desc->kstat_irqs) > continue; > for_each_possible_cpu(cpu) > sum += *per_cpu_ptr(desc->kstat_irqs, cpu); > > sums[xas->xa_index] = sum; > } > } OK, I will try something like that as a replacement of patch 4 to see how it compares with my current patch. Thanks, Longman