On Mon, Aug 29, 2011 at 09:46:09PM +1000, Anton Blanchard wrote: > > When testing on a 1024 thread ppc64 box I noticed a large amount of > CPU time in ext4 code. > > ext4_has_free_blocks has a fast path to avoid summing every free and > dirty block per cpu counter, but only if the global count shows more > free blocks than the maximum amount that could be stored in all the > per cpu counters. > > Since percpu_counter_batch scales with num_online_cpus() and the maximum > amount in all per cpu counters is percpu_counter_batch * num_online_cpus(), > this breakpoint grows at O(n^2). > > This issue will also hit with users of percpu_counter_compare which > does a similar thing for one percpu counter. > > I chose to cap percpu_counter_batch at 1024 as a conservative first > step, but we may want to reduce it further based on further benchmarking. > > Signed-off-by: Anton Blanchard <anton@xxxxxxxxx> Applied to percpu/for-3.2. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html