On 11/26/22 17:54, Waiman Long wrote:
On 11/26/22 10:53, Jens Axboe wrote:
On 11/26/22 7:29 AM, Yi Zhang wrote:
Hi Jens
Sorry for the delay as I couldn't reproduce it with the original
for-6.2/block branch.
Finally, I rebased the for-6.2/block branch on 6.1-rc6 and was able to
bisect it:
951d1e94801f95a3fc1c75ff342431c9f519dd14 is the first bad commit
commit 951d1e94801f95a3fc1c75ff342431c9f519dd14
Author: Waiman Long <longman@xxxxxxxxxx>
Date: Fri Nov 4 20:59:02 2022 -0400
blk-cgroup: Flush stats at blkgs destruction path
As noted by Michal, the blkg_iostat_set's in the lockless list
hold reference to blkg's to protect against their removal. Those
blkg's hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn()
which is
called when the blkcg reference count reaches 0. This circular
dependency
will prevent blkcg from being freed until some other events cause
cgroup_rstat_flush() to be called to flush out the pending
blkcg stats.
To prevent this delayed blkcg removal, add a new
cgroup_rstat_css_flush()
function to flush stats for a given css and cpu and call it at
the blkgs
destruction path, blkcg_destroy_blkgs(), whenever there are
still some
pending stats to be flushed. This will ensure that blkcg reference
count can reach 0 ASAP.
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
Acked-by: Tejun Heo <tj@xxxxxxxxxx>
Link:
https://lore.kernel.org/r/20221105005902.407297-4-longman@xxxxxxxxxx
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
Waiman, let me know if you have an idea what is going on here and can
send in a fix, or if I need to revert this one. From looking at the
lists of commits after these reports came in, I did suspect this
commit. But I don't know enough about this area to render an opinion
on a fix without spending more time on it.
Sure. I will take a closer look at that. Will let you know my
investigation result ASAP.
Thanks Yi for allowing me to access the system that can reproduce the
bug. I found out that the panic problem is fixed by moving the rstat
flushing before the destruction of blkgs in blkcg_destroy_blkgs(). I
will post another patch later to fix that bug. However, I want to spend
a bit more time to see if I can figure out what cause the panic in the
first place.
Cheers,
Longman