On 4 Oct 2022 11:17:48 -0400 Waiman Long <longman@xxxxxxxxxx> > For a system with many CPUs and block devices, the time to do > blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It > can be especially problematic as interrupt is disabled during the flush. > It was reported that it might take seconds to complete in some extreme > cases leading to hard lockup messages. > > As it is likely that not all the percpu blkg_iostat_set's has been > updated since the last flush, those stale blkg_iostat_set's don't need > to be flushed in this case. This patch optimizes blkcg_rstat_flush() > by keeping a lockless list of recently updated blkg_iostat_set's in a > newly added percpu blkcg->lhead pointer. > > The blkg_iostat_set is added to a sentinel lockless list on the update > side in blk_cgroup_bio_start(). It is removed from the sentinel lockless > list when flushed in blkcg_rstat_flush(). Due to racing, it is possible > that blk_iostat_set's in the lockless list may have no new IO stats to > be flushed, but that is OK. So it is likely that another flag, updated when bis is added to/deleted from llist, can cut 1/3 off without raising the risk of getting your patch over complicated. > > struct blkg_iostat_set { > struct u64_stats_sync sync; > + struct llist_node lnode; > + struct blkcg_gq *blkg; + atomic_t queued; > struct blkg_iostat cur; > struct blkg_iostat last; > };