On 06/28/2017 04:59 PM, Jens Axboe wrote: > On 06/28/2017 03:54 PM, Jens Axboe wrote: >> On 06/28/2017 03:12 PM, Brian King wrote: >>> -static inline int part_in_flight(struct hd_struct *part) >>> +static inline unsigned long part_in_flight(struct hd_struct *part) >>> { >>> - return atomic_read(&part->in_flight[0]) + atomic_read(&part->in_flight[1]); >>> + return part_stat_read(part, in_flight[0]) + part_stat_read(part, in_flight[1]); >> >> One obvious improvement would be to not do this twice, but only have to >> loop once. Instead of making this an array, make it a structure with a >> read and write count. >> >> It still doesn't really fix the issue of someone running on a kernel >> with a ton of possible CPUs configured. But it does reduce the overhead >> by 50%. > > Or something as simple as this: > > #define part_stat_read_double(part, field1, field2) \ > ({ \ > typeof((part)->dkstats->field1) res = 0; \ > unsigned int _cpu; \ > for_each_possible_cpu(_cpu) { \ > res += per_cpu_ptr((part)->dkstats, _cpu)->field1; \ > res += per_cpu_ptr((part)->dkstats, _cpu)->field2; \ > } \ > res; \ > }) > > static inline unsigned long part_in_flight(struct hd_struct *part) > { > return part_stat_read_double(part, in_flight[0], in_flight[1]); > } > I'll give this a try and also see about running some more exhaustive runs to see if there are any cases where we go backwards in performance. I'll also run with partitions and see how that impacts this. Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center