Re: [PATCH -next v2 8/9] block: fix null-pointer dereference in ioc_pd_init

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Mon, 5 Dec 2022 17:32:17 +0800

Hi, Tejun!

While reviewing rq_qos code, I found that there are some other possible
defects:

1) queue_lock is held to protect rq_qos_add() and rq_qos_del(), whlie
it's not held to protect rq_qos_exit(), which is absolutely not safe
because they can be called concurrently by configuring iocost and
removing device.
I'm thinking about holding the lock to fetch the list and reset
q->rq_qos first:

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 88f0fe7dcf54..271ad65eebd9 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -288,9 +288,15 @@ void rq_qos_wait(struct rq_wait *rqw, void 
*private_data,

 void rq_qos_exit(struct request_queue *q)
 {
-       while (q->rq_qos) {
-               struct rq_qos *rqos = q->rq_qos;
-               q->rq_qos = rqos->next;
+       struct rq_qos *rqos;
+
+       spin_lock_irq(&q->queue_lock);
+       rqos = q->rq_qos;
+       q->rq_qos = NULL;
+       spin_unlock_irq(&q->queue_lock);
+
+       while (rqos) {
                rqos->ops->exit(rqos);
+               rqos = rqos->next;
        }
 }

2) rq_qos_add() can still succeed after rq_qos_exit() is done, which
will cause memory leak. Hence a checking is required beforing adding
to q->rq_qos. I'm thinking about flag QUEUE_FLAG_DYING first, but the
flag will not set if disk state is not marked GD_OWNS_QUEUE. Since
blk_unregister_queue() is called before rq_qos_exit(), use the queue
flag QUEUE_FLAG_REGISTERED should be OK.

For the current problem that device can be removed while initializing
, I'm thinking about some possible solutions:

Since bfq is initialized in elevator initialization, and others are
in queue initialization, such problem is only possible in iocost, hence
it make sense to fix it in iocost:

1) use open mutex to prevet concurrency, however, this will cause
that configuring iocost will block some other operations that is relied
on open_mutex.

@@ -2889,7 +2889,15 @@ static int blk_iocost_init(struct gendisk *disk)
        if (ret)
                goto err_free_ioc;

+       mutex_lock(&disk->open_mutex);
+       if (!disk_live(disk)) {
+               mutex_unlock(&disk->open_mutex);
+               ret = -ENODEV;
+               goto err_del_qos;
+       }
+
        ret = blkcg_activate_policy(q, &blkcg_policy_iocost);
+       mutex_unlock(&disk->open_mutex);
        if (ret)
                goto err_del_qos;

2) like 1), the difference is that define a new mutex just in iocst.

3) Or is it better to fix it in the higher level? For example:
add a new restriction that blkcg_deactivate_policy() should be called
with blkcg_activate_policy() in pairs, and blkcg_deactivate_policy()
will wait for blkcg_activate_policy() to finish. Something like:

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ef4fef1af909..6266f702157f 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1410,7 +1410,7 @@ int blkcg_activate_policy(struct request_queue *q,
        struct blkcg_gq *blkg, *pinned_blkg = NULL;
        int ret;

-       if (blkcg_policy_enabled(q, pol))
+       if (WARN_ON_ONCE(blkcg_policy_enabled(q, pol)))
                return 0;

        if (queue_is_mq(q))
@@ -1477,6 +1477,8 @@ int blkcg_activate_policy(struct request_queue *q,
                blkg_put(pinned_blkg);
        if (pd_prealloc)
                pol->pd_free_fn(pd_prealloc);
+       if (!ret)
+               wake_up(q->policy_waitq);
        return ret;

 enomem:
@@ -1512,7 +1514,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
        struct blkcg_gq *blkg;

        if (!blkcg_policy_enabled(q, pol))
-               return;
+               wait_event(q->policy_waitq, blkcg_policy_enabled(q, pol));
   wait_event(q->xxx, blkcg_policy_enabled(q, pol));