On 11/26/22 10:53, Jens Axboe wrote:
On 11/26/22 7:29 AM, Yi Zhang wrote:
Hi Jens
Sorry for the delay as I couldn't reproduce it with the original
for-6.2/block branch.
Finally, I rebased the for-6.2/block branch on 6.1-rc6 and was able to
bisect it:
951d1e94801f95a3fc1c75ff342431c9f519dd14 is the first bad commit
commit 951d1e94801f95a3fc1c75ff342431c9f519dd14
Author: Waiman Long <longman@xxxxxxxxxx>
Date: Fri Nov 4 20:59:02 2022 -0400
blk-cgroup: Flush stats at blkgs destruction path
As noted by Michal, the blkg_iostat_set's in the lockless list
hold reference to blkg's to protect against their removal. Those
blkg's hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which is
called when the blkcg reference count reaches 0. This circular dependency
will prevent blkcg from being freed until some other events cause
cgroup_rstat_flush() to be called to flush out the pending blkcg stats.
To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush()
function to flush stats for a given css and cpu and call it at the blkgs
destruction path, blkcg_destroy_blkgs(), whenever there are still some
pending stats to be flushed. This will ensure that blkcg reference
count can reach 0 ASAP.
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
Acked-by: Tejun Heo <tj@xxxxxxxxxx>
Link: https://lore.kernel.org/r/20221105005902.407297-4-longman@xxxxxxxxxx
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
Waiman, let me know if you have an idea what is going on here and can
send in a fix, or if I need to revert this one. From looking at the
lists of commits after these reports came in, I did suspect this
commit. But I don't know enough about this area to render an opinion
on a fix without spending more time on it.
Sure. I will take a closer look at that. Will let you know my
investigation result ASAP.
Thanks,
Longman