On 11/26/22 06:29, Yi Zhang wrote:
Finally, I rebased the for-6.2/block branch on 6.1-rc6 and was able to bisect it: 951d1e94801f95a3fc1c75ff342431c9f519dd14 is the first bad commit commit 951d1e94801f95a3fc1c75ff342431c9f519dd14 Author: Waiman Long <longman@xxxxxxxxxx> Date: Fri Nov 4 20:59:02 2022 -0400 blk-cgroup: Flush stats at blkgs destruction path As noted by Michal, the blkg_iostat_set's in the lockless list hold reference to blkg's to protect against their removal. Those blkg's hold reference to blkcg. When a cgroup is being destroyed, cgroup_rstat_flush() is only called at css_release_work_fn() which is called when the blkcg reference count reaches 0. This circular dependency will prevent blkcg from being freed until some other events cause cgroup_rstat_flush() to be called to flush out the pending blkcg stats. To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush() function to flush stats for a given css and cpu and call it at the blkgs destruction path, blkcg_destroy_blkgs(), whenever there are still some pending stats to be flushed. This will ensure that blkcg reference count can reach 0 ASAP. Signed-off-by: Waiman Long <longman@xxxxxxxxxx> Acked-by: Tejun Heo <tj@xxxxxxxxxx> Link: https://lore.kernel.org/r/20221105005902.407297-4-longman@xxxxxxxxxx Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
I can confirm this report. If I revert patch "blk-cgroup: Flush stats at blkgs destruction path" on top of the block/for-next branch from last Wednesday then test block/027 passes. Test block/027 fails systematically with an unmodified block/for-next branch.
Bart.