Hi, Tejun!
在 2022/12/01 4:42, Tejun Heo 写道:
On Wed, Nov 30, 2022 at 09:21:54PM +0800, Li Nan wrote:
T1 T2 T3
//delete device
del_gendisk
bdi_unregister
bdi_remove_from_list
synchronize_rcu_expedited
//rmdir cgroup
blkcg_destroy_blkgs
blkg_destroy
percpu_ref_kill
blkg_release
call_rcu
rq_qos_exit
ioc_rqos_exit
kfree(ioc)
__blkg_release
blkg_free
blkg_free_workfn
pd_free_fn
ioc_pd_free
spin_lock_irqsave
->ioc is freed
Fix the problem by moving the operation on ioc in ioc_pd_free() to
ioc_pd_offline(), and just free resource in ioc_pd_free() like iolatency
and throttle.
Signed-off-by: Li Nan <linan122@xxxxxxxxxx>
I wonder what we really wanna do is pinning ioc while blkgs are still around
but I think this should work too.
I just found that this is not enough, other problems still exist:
t1:
bio_init
bio_associate_blkg
//get blkg->refcnt
......
submit_bio
blk_throtl_bio
// bio is throttlled, user thread can exit
t2:
// blkcg online_pin is zero
blkcg_destroy_blkgs
blkg_destroy
ioc_pd_offline
list_del_init(&iocg->active_list)
t3:
ioc_rqos_throttle
blkg_to_iocg
// got the iocg that is offlined
iocg_activate
// acitvate the iocg again
For consequence, kernel can crash due to access unexpected
address. Fortunately, bfq already handle similar problem by checking
blkg->online in bfq_bio_bfqg(), this problem can be fixed by checking
it in iocg_activate().
BTW, I'm still working on checking if other policies have the same
problem.
Thanks,
Kuai