Re: [PATCH v6 3/3] blk-cgroup: Optimize blkcg_rstat_flush()

Ming Lei <ming.lei@xxxxxxxxxx> · Mon, 6 Jun 2022 11:16:29 +0800



On Thu, Jun 02, 2022 at 03:20:20PM -0400, Waiman Long wrote:
> For a system with many CPUs and block devices, the time to do
> blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It
> can be especially problematic as interrupt is disabled during the flush.
> It was reported that it might take seconds to complete in some extreme
> cases leading to hard lockup messages.
> 
> As it is likely that not all the percpu blkg_iostat_set's has been
> updated since the last flush, those stale blkg_iostat_set's don't need
> to be flushed in this case. This patch optimizes blkcg_rstat_flush()
> by keeping a lockless list of recently updated blkg_iostat_set's in a
> newly added percpu blkcg->lhead pointer.
> 
> The blkg_iostat_set is added to the lockless list on the update side
> in blk_cgroup_bio_start(). It is removed from the lockless list when
> flushed in blkcg_rstat_flush(). Due to racing, it is possible that
> blk_iostat_set's in the lockless list may have no new IO stats to be
> flushed. To protect against destruction of blkg, a percpu reference is
> gotten when putting into the lockless list and put back when removed.
> 
> A blkg_iostat_set can determine if it is in a lockless list by checking
> the content of its lnode.next pointer which will be non-NULL when in
> a lockless list. This requires the presence of a special llist_last
> sentinel node to be put at the end of the lockless list.
> 
> When booting up an instrumented test kernel with this patch on a
> 2-socket 96-thread system with cgroup v2, out of the 2051 calls to
> cgroup_rstat_flush() after bootup, 1788 of the calls were exited
> immediately because of empty lockless list. After an all-cpu kernel
> build, the ratio became 6295424/6340513. That was more than 99%.
> 
> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> Acked-by: Tejun Heo <tj@xxxxxxxxxx>

Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>

Thanks,
Ming