On Wed, Nov 30, 2022 at 09:21:56PM +0800, Li Nan wrote: > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > Our test report a problem: > > ------------[ cut here ]------------ > list_del corruption. next->prev should be ffff888127e0c4b0, but was ffff888127e090b0 > WARNING: CPU: 2 PID: 3117789 at lib/list_debug.c:62 __list_del_entry_valid+0x119/0x130 > RIP: 0010:__list_del_entry_valid+0x119/0x130 > RIP: 0010:__list_del_entry_valid+0x119/0x130 > Call Trace: > <IRQ> > iocg_flush_stat.isra.0+0x11e/0x230 > ? ioc_rqos_done+0x230/0x230 > ? ioc_now+0x14f/0x180 > ioc_timer_fn+0x569/0x1640 > > We haven't reporduced it yet, but we think this is due to parent iocg is > freed before child iocg, and then in ioc_timer_fn, walk_list is > corrupted. > > 1) Remove child cgroup can concurrent with remove parent cgroup, and > ioc_pd_free for parent iocg can be called before child iocg. This can be > fixed by moving the handle of walk_list to ioc_pd_offline, since that > offline from child is ensured to be called before parent. Which you already did in a previous patch, right? > 2) ioc_pd_free can be triggered from both removing device and removing > cgroup, this patch fix the problem by deleting timer before deactivating > policy, so that free parent iocg first in this case won't matter. Okay, so, yeah, css's pin parents but blkg's don't. I think the right thing to do here is making sure that a child blkg pins its parent (and eventually ioc). > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > Signed-off-by: Li Nan <linan122@xxxxxxxxxx> > --- > block/blk-iocost.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/block/blk-iocost.c b/block/blk-iocost.c > index 710cf63a1643..d2b873908f88 100644 > --- a/block/blk-iocost.c > +++ b/block/blk-iocost.c > @@ -2813,13 +2813,14 @@ static void ioc_rqos_exit(struct rq_qos *rqos) > { > struct ioc *ioc = rqos_to_ioc(rqos); > > + del_timer_sync(&ioc->timer); > + > blkcg_deactivate_policy(rqos->q, &blkcg_policy_iocost); > > spin_lock_irq(&ioc->lock); > ioc->running = IOC_STOP; > spin_unlock_irq(&ioc->lock); > > - del_timer_sync(&ioc->timer); I don't about this workaround. Let's fix properly? -- tejun