On 06/28/2017 04:07 PM, Brian King wrote: > On 06/28/2017 04:59 PM, Jens Axboe wrote: >> On 06/28/2017 03:54 PM, Jens Axboe wrote: >>> On 06/28/2017 03:12 PM, Brian King wrote: >>>> -static inline int part_in_flight(struct hd_struct *part) >>>> +static inline unsigned long part_in_flight(struct hd_struct *part) >>>> { >>>> - return atomic_read(&part->in_flight[0]) + atomic_read(&part->in_flight[1]); >>>> + return part_stat_read(part, in_flight[0]) + part_stat_read(part, in_flight[1]); >>> >>> One obvious improvement would be to not do this twice, but only have to >>> loop once. Instead of making this an array, make it a structure with a >>> read and write count. >>> >>> It still doesn't really fix the issue of someone running on a kernel >>> with a ton of possible CPUs configured. But it does reduce the overhead >>> by 50%. >> >> Or something as simple as this: >> >> #define part_stat_read_double(part, field1, field2) \ >> ({ \ >> typeof((part)->dkstats->field1) res = 0; \ >> unsigned int _cpu; \ >> for_each_possible_cpu(_cpu) { \ >> res += per_cpu_ptr((part)->dkstats, _cpu)->field1; \ >> res += per_cpu_ptr((part)->dkstats, _cpu)->field2; \ >> } \ >> res; \ >> }) >> >> static inline unsigned long part_in_flight(struct hd_struct *part) >> { >> return part_stat_read_double(part, in_flight[0], in_flight[1]); >> } >> > > I'll give this a try and also see about running some more exhaustive > runs to see if there are any cases where we go backwards in performance. > > I'll also run with partitions and see how that impacts this. And do something nuts, like setting NR_CPUS to 512 or whatever. What do distros ship with? -- Jens Axboe