On 06/18, Frederic Weisbecker wrote: > > On Tue, Jun 18, 2013 at 04:42:25PM +0200, Oleg Nesterov wrote: > > > > Simplest example, > > > > for_each_possible_cpu(cpu) > > total_count = per_cpu(per_cpu_count, cpu); > > > > Every per_cpu() likely means the cache miss. Not to mention we need the > > additional math to calculate the address of the local counter. > > > > for_each_possible_cpu(cpu) > > total_count = bootmem_or_kmalloc_array[cpu]; > > > > is much better in this respect. > > > > And note also that per_cpu_count above can share the cacheline with > > another "hot" per-cpu variable. > > Ah I see, that's good to know. > > But these variables are supposed to only be touched from slow path > (perf events syscall, ptrace breakpoints creation, etc...), right? > So this is probably not a problem? Yes, sure. But please note that this can also penalize other CPUs. For example, toggle_bp_slot() writes to per_cpu(nr_cpu_bp_pinned), this invalidates the cachline which can contain another per-cpu variable. But let me clarify. I agree, this all is minor, I am not trying to say this change can actually improve the performance. The main point of this patch is to make the code look a bit better, and you seem to agree. The changelog mentions s/percpu/array/ only as a potential change which obviously needs more discussion, I didnt mean that we should necessarily do this. Although yes, personally I really dislike per-cpu in this case, but of course this is subjective and I won't argue ;) Oleg. -- To unsubscribe from this list: send the line "unsubscribe trinity" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html