Re: [PATCH] null_blk: fix command timeout completion handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/04/2021 00:52, Damien Le Moal wrote:
> Memory backed or zoned null block devices may generate actual request
> timeout errors due to the submission path being blocked on memory
> allocation or zone locking. Unlike fake timeouts or injected timeouts,
> the request submission path will call blk_mq_complete_request() or
> blk_mq_end_request() for these real timeout errors, causing a double
> completion and use after free situation as the block layer timeout
> handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in
> blk_mq_check_expired(). This problem often triggers a NULL pointer
> dereference such as:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000050
> RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20
> ...
> Call Trace:
>   dd_finish_request+0x56/0x80
>   blk_mq_free_request+0x37/0x130
>   null_handle_cmd+0xbf/0x250 [null_blk]
>   ? null_queue_rq+0x67/0xd0 [null_blk]
>   blk_mq_dispatch_rq_list+0x122/0x850
>   __blk_mq_do_dispatch_sched+0xbb/0x2c0
>   __blk_mq_sched_dispatch_requests+0x13d/0x190
>   blk_mq_sched_dispatch_requests+0x30/0x60
>   __blk_mq_run_hw_queue+0x49/0x90
>   process_one_work+0x26c/0x580
>   worker_thread+0x55/0x3c0
>   ? process_one_work+0x580/0x580
>   kthread+0x134/0x150
>   ? kthread_create_worker_on_cpu+0x70/0x70
>   ret_from_fork+0x1f/0x30
> 
> This problem very often triggers when running the full btrfs xfstests
> on a memory-backed zoned null block device in a VM with limited amount
> of memory.
> 
> Avoid this by executing blk_mq_complete_request() in null_timeout_rq()
> only for commands that are marked for a fake timeout completion using
> the fake_timeout boolean in struct null_cmd. For timeout errors injected
> through debugfs, the timeout handler will execute
> blk_mq_complete_request()i as before. This is safe as the submission              Nit: stray i ^

> path does not execute complete requests in this case.
> 
> In null_timeout_rq(), also make sure to set the command error field to
> BLK_STS_TIMEOUT and to propagate this error through to the request
> completion.
> 
> Reported-by: Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx>
> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
> ---

Tested-by: Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx>
Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx>

Thanks a lot




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux