Re: [PATCH] block: fix null pointer dereference in blk_mq_rq_timed_out()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 9/12/19 5:29 AM, Yufen Yu wrote:


On 2019/9/12 10:46, Ming Lei wrote:
On Sat, Sep 07, 2019 at 06:24:50PM +0800, Yufen Yu wrote:
There is a race condition between timeout check and completion for
flush request as follow:

timeout_work    issue flush      issue flush
                 blk_insert_flush
                                  blk_insert_flush
blk_mq_timeout_work
                 blk_kick_flush

blk_mq_queue_tag_busy_iter
blk_mq_check_expired(flush_rq)

                 __blk_mq_end_request
                flush_end_io
                blk_kick_flush
                blk_rq_init(flush_rq)
                memset(flush_rq, 0)
Not see there is memset(flush_rq, 0) in block/blk-flush.c

Call path as follow:

blk_kick_flush
    blk_rq_init
        memset(rq, 0, sizeof(*rq));

blk_mq_timed_out
BUG_ON flush_rq->q->mq_ops
flush_rq->q won't be changed by blk_rq_init(), and either READ or WRITE
on variable with machine WORD length are atomic, so how can the BUG_ON()
be triggered? Do you have the actual BUG log?

Also now it is driver's responsibility for avoiding race between normal
completion and timeout.

I have reproduced the bug by adding time delay in timeout handle and memset.
BUG_ON log as follow:

[  108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
[  108.826091] #PF: supervisor read access in kernel mode
[  108.826543] #PF: error_code(0x0000) - not-present page
[  108.827059] PGD 0 P4D 0
[  108.827313] Oops: 0000 [#1] SMP PTI
[  108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431 [  108.828326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[  108.829503] Workqueue: kblockd blk_mq_timeout_work
[  108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[  108.830439] Code: 01 e9 0a ff ff ff 48 83 05 34 45 dd 02 01 4c 39 63 40 0f 84 8a 00 00 00 0d 00 00 20 00 40 0f b6 f5 41 89 44 24 1c 49 8b 04 24 <48> 8b 40 40 48 8b 40 20 48 85 c0 0f 84 90 00 00 00 48 83 05 2f 44
[  108.832246] RSP: 0018:ffffbf7ac18b7db0 EFLAGS: 00010206
[  108.832756] RAX: 0000000000000000 RBX: ffffffffb56e0250 RCX: 0000000000000000 [  108.833444] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ab7fbb96538 [  108.834149] RBP: 0000000000000000 R08: 000000000000024b R09: 0000000000000030 [  108.834840] R10: 000000000000004e R11: ffffbf7ac18b7c40 R12: ffff9ab7f756e000 [  108.835531] R13: ffffbf7ac18b7e70 R14: 0000000000000017 R15: ffff9ab7f6ead0a0 [  108.836228] FS:  0000000000000000(0000) GS:ffff9ab7fbb80000(0000) knlGS:0000000000000000
[  108.837026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  108.837588] CR2: 0000000000000040 CR3: 000000013544c000 CR4: 00000000000006e0
[  108.838191] Call Trace:
[  108.838406]  bt_iter+0x74/0x80
[  108.838665]  blk_mq_queue_tag_busy_iter+0x204/0x450
[  108.839074]  ? __switch_to_asm+0x34/0x70
[  108.839405]  ? blk_mq_stop_hw_queue+0x40/0x40
[  108.839823]  ? blk_mq_stop_hw_queue+0x40/0x40
[  108.840273]  ? syscall_return_via_sysret+0xf/0x7f
[  108.840732]  blk_mq_timeout_work+0x74/0x200
[  108.841151]  process_one_work+0x297/0x680
[  108.841550]  worker_thread+0x29c/0x6f0
[  108.841926]  ? rescuer_thread+0x580/0x580
[  108.842344]  kthread+0x16a/0x1a0
[  108.842666]  ? kthread_flush_work+0x170/0x170
[  108.843100]  ret_from_fork+0x35/0x40
[  108.843455] Modules linked in:
[  108.843758] CR2: 0000000000000040
[  108.844090] ---[ end trace e0ac552505fa1b95 ]---

blk_mq_rq_timed_out() attempt to read 'req->q->mq_ops->timeout', but 'q == 0' currently,
which triggers BUG_ON.

We have a similar calltrace which happened in older kernel (4.4.62), not sure if it is the same one.

[32353526.224059] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G           O    4.4.62-1-storage #4.4.62-1.3
[32353526.224343] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018
[...]
[32353526.224840] RIP: 0010:[<ffffffff812df5a1>] [<ffffffff812df5a1>] blk_mq_rq_timed_out+0x11/0x70
[32353526.285015]  [<ffffffff812df63d>] blk_mq_check_expired+0x3d/0x60
[32353526.301997]  [<ffffffff812e1f74>] bt_for_each+0xd4/0xe0
[32353526.310730]  [<ffffffff812df600>] ? blk_mq_rq_timed_out+0x70/0x70
[32353526.319579]  [<ffffffff812df600>] ? blk_mq_rq_timed_out+0x70/0x70
[32353526.328329]  [<ffffffff812e2753>] blk_mq_queue_tag_busy_iter+0x43/0xc0
[32353526.336995]  [<ffffffff812de7c0>] ? blk_mq_bio_to_request+0x40/0x40
[32353526.345566]  [<ffffffff812de7f2>] blk_mq_rq_timer+0x32/0xd0
[32353526.354094]  [<ffffffff810b2e45>] call_timer_fn+0x35/0x130
[32353526.362542]  [<ffffffff812de7c0>] ? blk_mq_bio_to_request+0x40/0x40
[32353526.370894]  [<ffffffff810b31b7>] run_timer_softirq+0x157/0x280

Thanks,
Guoqing




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux