Re: [PATCH V1] block: Fix use-after-free issue while accessing ioscheduler lock

Ming Lei <tom.leiming@xxxxxxxxx> · Wed, 16 Sep 2020 08:58:48 +0800

On Mon, Sep 14, 2020 at 10:14 PM Pradeep P V K <ppvk@xxxxxxxxxxxxxx> wrote:
>
> Observes below crash while accessing (use-after-free) lock member
> of bfq data.
>
> context#1                       context#2
>                                 process_one_work()
> kthread()                       blk_mq_run_work_fn()
> worker_thread()                  ->__blk_mq_run_hw_queue()
> process_one_work()                ->blk_mq_sched_dispatch_requests()
> __blk_release_queue()               ->blk_mq_do_dispatch_sched()
> ->__elevator_exit()
>   ->blk_mq_exit_sched()
>     ->exit_sched()
>       ->kfree()
>                                        ->bfq_dispatch_request()
>                                          ->spin_unlock_irq(&bfqd->lock)
>
> This is because of the kblockd delayed work that might got scheduled
> around blk_release_queue() and accessed use-after-free member of
> bfq_data.
>
> 240.212359:   <2> Unable to handle kernel paging request at
> virtual address ffffffee2e33ad70
> ...
> 240.212637:   <2> Workqueue: kblockd blk_mq_run_work_fn
> 240.212649:   <2> pstate: 00c00085 (nzcv daIf +PAN +UAO)
> 240.212666:   <2> pc : queued_spin_lock_slowpath+0x10c/0x2e0
> 240.212677:   <2> lr : queued_spin_lock_slowpath+0x84/0x2e0
> ...
> Call trace:
> 240.212865:   <2>  queued_spin_lock_slowpath+0x10c/0x2e0
> 240.212876:   <2>  do_raw_spin_lock+0xf0/0xf4
> 240.212890:   <2>  _raw_spin_lock_irq+0x74/0x94
> 240.212906:   <2>  bfq_dispatch_request+0x4c/0xd60
> 240.212918:   <2>  blk_mq_do_dispatch_sched+0xe0/0x1f0
> 240.212927:   <2>  blk_mq_sched_dispatch_requests+0x130/0x194
> 240.212940:   <2>  __blk_mq_run_hw_queue+0x100/0x158
> 240.212950:   <2>  blk_mq_run_work_fn+0x1c/0x28
> 240.212963:   <2>  process_one_work+0x280/0x460
> 240.212973:   <2>  worker_thread+0x27c/0x4dc
> 240.212986:   <2>  kthread+0x160/0x170
>
> Fix this by cancelling the delayed work if any before elevator exits.
>
> Signed-off-by: Pradeep P V K <ppvk@xxxxxxxxxxxxxx>
> ---
>  block/blk-sysfs.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 81722cd..e4a9aac 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -779,6 +779,8 @@ static void blk_release_queue(struct kobject *kobj)
>  {
>         struct request_queue *q =
>                 container_of(kobj, struct request_queue, kobj);
> +       struct blk_mq_hw_ctx *hctx;
> +       int i;

Please move the above two lines to the branch of 'if (queue_is_mq(q)) '.

>
>         might_sleep();
>
> @@ -788,9 +790,11 @@ static void blk_release_queue(struct kobject *kobj)
>
>         blk_free_queue_stats(q->stats);
>
> -       if (queue_is_mq(q))
> +       if (queue_is_mq(q)) {
>                 cancel_delayed_work_sync(&q->requeue_work);
> -
> +               queue_for_each_hw_ctx(q, hctx, i)
> +                       cancel_delayed_work_sync(&hctx->run_work);
> +       }

Now the 'cancel_delayed_work_sync' from blk_mq_hw_sysfs_release() can
be killed meantime.

We have to call cancel_delayed_work_sync when reqeuest queue's
refcount drops to zero, otherwise
new run queue may be started somewhere, so feel free to add the
reviewed-by after the above two
comments are addressed.

Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>

Thanks,
Ming Lei