On 11/26/22 7:29 AM, Yi Zhang wrote: > Hi Jens > Sorry for the delay as I couldn't reproduce it with the original > for-6.2/block branch. > Finally, I rebased the for-6.2/block branch on 6.1-rc6 and was able to > bisect it: > > > 951d1e94801f95a3fc1c75ff342431c9f519dd14 is the first bad commit > commit 951d1e94801f95a3fc1c75ff342431c9f519dd14 > Author: Waiman Long <longman@xxxxxxxxxx> > Date: Fri Nov 4 20:59:02 2022 -0400 > > blk-cgroup: Flush stats at blkgs destruction path > > As noted by Michal, the blkg_iostat_set's in the lockless list > hold reference to blkg's to protect against their removal. Those > blkg's hold reference to blkcg. When a cgroup is being destroyed, > cgroup_rstat_flush() is only called at css_release_work_fn() which is > called when the blkcg reference count reaches 0. This circular dependency > will prevent blkcg from being freed until some other events cause > cgroup_rstat_flush() to be called to flush out the pending blkcg stats. > > To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush() > function to flush stats for a given css and cpu and call it at the blkgs > destruction path, blkcg_destroy_blkgs(), whenever there are still some > pending stats to be flushed. This will ensure that blkcg reference > count can reach 0 ASAP. > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > Acked-by: Tejun Heo <tj@xxxxxxxxxx> > Link: https://lore.kernel.org/r/20221105005902.407297-4-longman@xxxxxxxxxx > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> Waiman, let me know if you have an idea what is going on here and can send in a fix, or if I need to revert this one. From looking at the lists of commits after these reports came in, I did suspect this commit. But I don't know enough about this area to render an opinion on a fix without spending more time on it. -- Jens Axboe