On 06/28/2017 04:49 PM, Jens Axboe wrote: > On 06/28/2017 03:12 PM, Brian King wrote: >> This patch converts the in_flight counter in struct hd_struct from a >> pair of atomics to a pair of percpu counters. This eliminates a couple >> of atomics from the hot path. When running this on a Power system, to >> a single null_blk device with 80 submission queues, irq mode 0, with >> 80 fio jobs, I saw IOPs go from 1.5M IO/s to 11.4 IO/s. > > This has been done before, but I've never really liked it. The reason is > that it means that reading the part stat inflight count now has to > iterate over every possible CPU. Did you use partitions in your testing? > How many CPUs were configured? When I last tested this a few years ago I did not use partitions. I was running this on a 4 socket Power 8 machine with 5 cores per socket, running with 4 threads per core, so a total of 80 logical CPUs were usable in Linux. I was missing the fact that part_round_stats_single calls part_in_flight and had only noticed the sysfs and procfs users of part_in_flight previously. -Brian -- Brian King Power Linux I/O IBM Linux Technology Center -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel