Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Tue, 20 Dec 2022 17:19:12 +0800

Hi,

在 2022/12/20 4:55, Tejun Heo 写道:
Hello,

On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

iocost is initialized when it's configured the first time, and iocost
initializing can race with del_gendisk(), which will cause null pointer
dereference:

t1				t2
ioc_qos_write
  blk_iocost_init
   rq_qos_add
   				del_gendisk
   				 rq_qos_exit
   				 //iocost is removed from q->roqs
   blkcg_activate_policy
    pd_init_fn
     ioc_pd_init
      ioc = q_to_ioc(blkg->q)
      //can't find iocost and return null

And iolatency is about to switch to the same lazy initialization.

This patchset fix this problem by synchronize rq_qos_add() and
blkcg_activate_policy() with rq_qos_exit().

So, the patchset seems a bit overly complicated to me. What do you think
about the following?

* These init/exit paths are super cold path, just protecting them with a
   global mutex is probably enough. If we encounter a scalability problem,
   it's easy to fix down the line.

* If we're synchronizing this with a mutex anyway, no need to grab the
   queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex.

* And we can keep the state tracking within rq_qos. When rq_qos_exit() is
   called, mark it so that future adds will fail - be that a special ->next
   value a queue flag or whatever.

Yes, that sounds good. BTW, queue_lock is also used to protect
pd_alloc_fn/pd_init_fn，and we found that blkcg_activate_policy() is
problematic:

blkcg_activate_policy
 spin_lock_irq(&q->queue_lock);
 list_for_each_entry_reverse(blkg, &q->blkg_list
  pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed

  spin_unlock_irq(&q->queue_lock);
  // release queue_lock here is problematic, this will cause
pd_offline_fn called without pd_init_fn.
  pd_alloc_fn(__GFP_NOWARN,...)

If we are using a mutex to protect rq_qos ops, it seems the right thing
to do do also using the mutex to protect blkcg_policy ops, and this
problem can be fixed because mutex can be held to alloc memroy with
GFP_KERNEL. What do you think?

Thanks,
Kuai

Thanks.