On Fri, 2022-03-18 at 21:01 +0800, Ming Lei wrote: > In the whole lifetime of blkcg_gq instance, ->q will be referred, such > as, ->pd_free_fn() is called in blkg_free, and throtl_pd_free() still > may touch the request queue via &tg->service_queue.pending_timer which > is handled by throtl_pending_timer_fn(), so it is reasonable to grab > request queue's refcnt by blkcg_gq instance. > > Previously blkcg_exit_queue() is called from blk_release_queue, and it > is hard to avoid the use-after-free. But recently commit 1059699f87eb ("block: > move blkcg initialization/destroy into disk allocation/release handler") > is merged to for-5.18/block, it becomes simple to fix the issue by simply > grabbing request queue's refcnt. This patch, upstream as commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's refcnt") causes the nvdimm unit tests to spam messages like: [ 51.439133] debugfs: Directory 'pmem2' with parent 'block' already present! [ 52.095679] debugfs: Directory 'pmem3' with parent 'block' already present! [ 52.505613] block device autoloading is deprecated and will be removed. [ 52.791693] debugfs: Directory 'pmem2' with parent 'block' already present! [ 53.240314] debugfs: Directory 'pmem3' with parent 'block' already present! [ 53.373472] debugfs: Directory 'pmem3' with parent 'block' already present! [ 53.688876] nd_pmem btt2.0: No existing arenas [ 53.773097] debugfs: Directory 'pmem2s' with parent 'block' already present! [ 53.884493] debugfs: Directory 'pmem2s' with parent 'block' already present! [ 54.042946] debugfs: Directory 'pmem2s' with parent 'block' already present! [ 54.195954] debugfs: Directory 'pmem2s' with parent 'block' already present! [ 54.383989] debugfs: Directory 'pmem2' with parent 'block' already present! [ 54.577206] nd_pmem btt3.0: No existing arenas ...on a test doing block device setup/teardown. It is still reproducible in v5.18-rc3 and I bisected manually applying commit d578c770c852 ("block: avoid calling blkg_free() in atomic context") to avoid the other identified regression with this change. I'll take a deeper look tomorrow. It could be this is triggering a latent bug in the pmem driver, but wanted to send this in case someone saw an immediate ordering problem caused by this change besides the one fixed by d578c770c852. > > Reported-by: Christoph Hellwig <hch@xxxxxx> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > block/blk-cgroup.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index fa063c6c0338..d53b0d69dd73 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -82,6 +82,8 @@ static void blkg_free(struct blkcg_gq *blkg) > if (blkg->pd[i]) > blkcg_policy[i]->pd_free_fn(blkg->pd[i]); > > + if (blkg->q) > + blk_put_queue(blkg->q); > free_percpu(blkg->iostat_cpu); > percpu_ref_exit(&blkg->refcnt); > kfree(blkg); > @@ -167,6 +169,9 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q, > if (!blkg->iostat_cpu) > goto err_free; > > + if (!blk_get_queue(q)) > + goto err_free; > + > blkg->q = q; > INIT_LIST_HEAD(&blkg->q_node); > spin_lock_init(&blkg->async_bio_lock);