Re: [PATCH V1] block: Fix use-after-free issue while accessing ioscheduler lock

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 15 Sep 2020 11:12:55 +0800

On Mon, Sep 14, 2020 at 07:42:39PM +0530, Pradeep P V K wrote:
> Observes below crash while accessing (use-after-free) lock member
> of bfq data.
> 
> context#1			context#2
> 				process_one_work()
> kthread()			blk_mq_run_work_fn()
> worker_thread()			 ->__blk_mq_run_hw_queue()
> process_one_work()		  ->blk_mq_sched_dispatch_requests()
> __blk_release_queue()		    ->blk_mq_do_dispatch_sched()
> ->__elevator_exit()
>   ->blk_mq_exit_sched()
>     ->exit_sched()
>       ->kfree()
> 				       ->bfq_dispatch_request()
> 				         ->spin_unlock_irq(&bfqd->lock)
> 
> This is because of the kblockd delayed work that might got scheduled
> around blk_release_queue() and accessed use-after-free member of
> bfq_data.
> 
> 240.212359:   <2> Unable to handle kernel paging request at
> virtual address ffffffee2e33ad70
> ...
> 240.212637:   <2> Workqueue: kblockd blk_mq_run_work_fn
> 240.212649:   <2> pstate: 00c00085 (nzcv daIf +PAN +UAO)
> 240.212666:   <2> pc : queued_spin_lock_slowpath+0x10c/0x2e0
> 240.212677:   <2> lr : queued_spin_lock_slowpath+0x84/0x2e0
> ...
> Call trace:
> 240.212865:   <2>  queued_spin_lock_slowpath+0x10c/0x2e0
> 240.212876:   <2>  do_raw_spin_lock+0xf0/0xf4
> 240.212890:   <2>  _raw_spin_lock_irq+0x74/0x94
> 240.212906:   <2>  bfq_dispatch_request+0x4c/0xd60
> 240.212918:   <2>  blk_mq_do_dispatch_sched+0xe0/0x1f0
> 240.212927:   <2>  blk_mq_sched_dispatch_requests+0x130/0x194
> 240.212940:   <2>  __blk_mq_run_hw_queue+0x100/0x158
> 240.212950:   <2>  blk_mq_run_work_fn+0x1c/0x28
> 240.212963:   <2>  process_one_work+0x280/0x460
> 240.212973:   <2>  worker_thread+0x27c/0x4dc
> 240.212986:   <2>  kthread+0x160/0x170
> 
> Fix this by cancelling the delayed work if any before elevator exits.
> 
> Signed-off-by: Pradeep P V K <ppvk@xxxxxxxxxxxxxx>
> ---
>  block/blk-sysfs.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 81722cd..e4a9aac 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -779,6 +779,8 @@ static void blk_release_queue(struct kobject *kobj)
>  {
>  	struct request_queue *q =
>  		container_of(kobj, struct request_queue, kobj);
> +	struct blk_mq_hw_ctx *hctx;
> +	int i;
>  
>  	might_sleep();
>  
> @@ -788,9 +790,11 @@ static void blk_release_queue(struct kobject *kobj)
>  
>  	blk_free_queue_stats(q->stats);
>  
> -	if (queue_is_mq(q))
> +	if (queue_is_mq(q)) {
>  		cancel_delayed_work_sync(&q->requeue_work);
> -
> +		queue_for_each_hw_ctx(q, hctx, i)
> +			cancel_delayed_work_sync(&hctx->run_work);
> +	}

hw queue may be run synchronously, such as from flush plug context.
However we have grabbed one usage ref for that.

So looks this approach is fine, but just wondering why not putting
the above change into blk_sync_queue() or blk_cleanup_queue() directly?

Thanks,
Ming