On Thu, Jun 02, 2022 at 03:20:20PM -0400, Waiman Long wrote: > For a system with many CPUs and block devices, the time to do > blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It > can be especially problematic as interrupt is disabled during the flush. > It was reported that it might take seconds to complete in some extreme > cases leading to hard lockup messages. > > As it is likely that not all the percpu blkg_iostat_set's has been > updated since the last flush, those stale blkg_iostat_set's don't need > to be flushed in this case. This patch optimizes blkcg_rstat_flush() > by keeping a lockless list of recently updated blkg_iostat_set's in a > newly added percpu blkcg->lhead pointer. > > The blkg_iostat_set is added to the lockless list on the update side > in blk_cgroup_bio_start(). It is removed from the lockless list when > flushed in blkcg_rstat_flush(). Due to racing, it is possible that > blk_iostat_set's in the lockless list may have no new IO stats to be > flushed. To protect against destruction of blkg, a percpu reference is > gotten when putting into the lockless list and put back when removed. > > A blkg_iostat_set can determine if it is in a lockless list by checking > the content of its lnode.next pointer which will be non-NULL when in > a lockless list. This requires the presence of a special llist_last > sentinel node to be put at the end of the lockless list. > > When booting up an instrumented test kernel with this patch on a > 2-socket 96-thread system with cgroup v2, out of the 2051 calls to > cgroup_rstat_flush() after bootup, 1788 of the calls were exited > immediately because of empty lockless list. After an all-cpu kernel > build, the ratio became 6295424/6340513. That was more than 99%. > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > Acked-by: Tejun Heo <tj@xxxxxxxxxx> Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> Thanks, Ming