Re: [PATCH -next v2 9/9] blk-iocost: fix walk_list corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 30, 2022 at 09:21:56PM +0800, Li Nan wrote:
> From: Yu Kuai <yukuai3@xxxxxxxxxx>
> 
> Our test report a problem:
> 
> ------------[ cut here ]------------
> list_del corruption. next->prev should be ffff888127e0c4b0, but was ffff888127e090b0
> WARNING: CPU: 2 PID: 3117789 at lib/list_debug.c:62 __list_del_entry_valid+0x119/0x130
> RIP: 0010:__list_del_entry_valid+0x119/0x130
> RIP: 0010:__list_del_entry_valid+0x119/0x130
> Call Trace:
>  <IRQ>
>  iocg_flush_stat.isra.0+0x11e/0x230
>  ? ioc_rqos_done+0x230/0x230
>  ? ioc_now+0x14f/0x180
>  ioc_timer_fn+0x569/0x1640
> 
> We haven't reporduced it yet, but we think this is due to parent iocg is
> freed before child iocg, and then in ioc_timer_fn, walk_list is
> corrupted.
> 
> 1) Remove child cgroup can concurrent with remove parent cgroup, and
> ioc_pd_free for parent iocg can be called before child iocg. This can be
> fixed by moving the handle of walk_list to ioc_pd_offline, since that
> offline from child is ensured to be called before parent.

Which you already did in a previous patch, right?

> 2) ioc_pd_free can be triggered from both removing device and removing
> cgroup, this patch fix the problem by deleting timer before deactivating
> policy, so that free parent iocg first in this case won't matter.

Okay, so, yeah, css's pin parents but blkg's don't. I think the right thing
to do here is making sure that a child blkg pins its parent (and eventually
ioc).

> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
> Signed-off-by: Li Nan <linan122@xxxxxxxxxx>
> ---
>  block/blk-iocost.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
> index 710cf63a1643..d2b873908f88 100644
> --- a/block/blk-iocost.c
> +++ b/block/blk-iocost.c
> @@ -2813,13 +2813,14 @@ static void ioc_rqos_exit(struct rq_qos *rqos)
>  {
>  	struct ioc *ioc = rqos_to_ioc(rqos);
>  
> +	del_timer_sync(&ioc->timer);
> +
>  	blkcg_deactivate_policy(rqos->q, &blkcg_policy_iocost);
>  
>  	spin_lock_irq(&ioc->lock);
>  	ioc->running = IOC_STOP;
>  	spin_unlock_irq(&ioc->lock);
>  
> -	del_timer_sync(&ioc->timer);

I don't about this workaround. Let's fix properly?

-- 
tejun



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux