On 06/28/2017 03:54 PM, Jens Axboe wrote: > On 06/28/2017 03:12 PM, Brian King wrote: >> -static inline int part_in_flight(struct hd_struct *part) >> +static inline unsigned long part_in_flight(struct hd_struct *part) >> { >> - return atomic_read(&part->in_flight[0]) + atomic_read(&part->in_flight[1]); >> + return part_stat_read(part, in_flight[0]) + part_stat_read(part, in_flight[1]); > > One obvious improvement would be to not do this twice, but only have to > loop once. Instead of making this an array, make it a structure with a > read and write count. > > It still doesn't really fix the issue of someone running on a kernel > with a ton of possible CPUs configured. But it does reduce the overhead > by 50%. Or something as simple as this: #define part_stat_read_double(part, field1, field2) \ ({ \ typeof((part)->dkstats->field1) res = 0; \ unsigned int _cpu; \ for_each_possible_cpu(_cpu) { \ res += per_cpu_ptr((part)->dkstats, _cpu)->field1; \ res += per_cpu_ptr((part)->dkstats, _cpu)->field2; \ } \ res; \ }) static inline unsigned long part_in_flight(struct hd_struct *part) { return part_stat_read_double(part, in_flight[0], in_flight[1]); } -- Jens Axboe