On 12/19/18 8:24 PM, jianchao.wang wrote: > > > On 12/20/18 11:17 AM, Jens Axboe wrote: >> On 12/19/18 5:16 PM, Bart Van Assche wrote: >>> On Wed, 2018-12-19 at 16:27 -0700, Jens Axboe wrote: >>>> On 12/19/18 4:24 PM, Bart Van Assche wrote: >>>>> Hello, >>>>> >>>>> If I run the srp blktests in a loop then I see the below call stack appearing >>>>> sporadically. I have not yet had the time to analyze this but I'm reporting >>>>> this here in case someone else would already have had a look at this. >>>>> >>>>> Bart. >>>>> >>>>> ================================================================== >>>>> BUG: KASAN: use-after-free in bt_iter+0x86/0xf0 >>>>> Read of size 8 at addr ffff88803b335240 by task fio/21412 >>>>> >>>>> CPU: 0 PID: 21412 Comm: fio Tainted: G W 4.20.0-rc6-dbg+ #3 >>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 >>>>> Call Trace: >>>>> dump_stack+0x86/0xca >>>>> print_address_description+0x71/0x239 >>>>> kasan_report.cold.5+0x242/0x301 >>>>> __asan_load8+0x54/0x90 >>>>> bt_iter+0x86/0xf0 >>>>> blk_mq_queue_tag_busy_iter+0x373/0x5e0 >>>>> blk_mq_in_flight+0x96/0xb0 >>>>> part_in_flight+0x40/0x140 >>>>> part_round_stats+0x18e/0x370 >>>>> blk_account_io_start+0x3d7/0x670 >>>>> blk_mq_bio_to_request+0x19c/0x3a0 >>>>> blk_mq_make_request+0x7a9/0xcb0 >>>>> generic_make_request+0x41d/0x960 >>>>> submit_bio+0x9b/0x250 >>>>> do_blockdev_direct_IO+0x435c/0x4c70 >>>>> __blockdev_direct_IO+0x79/0x88 >>>>> ext4_direct_IO+0x46c/0xc00 >>>>> generic_file_direct_write+0x119/0x210 >>>>> __generic_file_write_iter+0x11c/0x280 >>>>> ext4_file_write_iter+0x1b8/0x6f0 >>>>> aio_write+0x204/0x310 >>>>> io_submit_one+0x9d3/0xe80 >>>>> __x64_sys_io_submit+0x115/0x340 >>>>> do_syscall_64+0x71/0x210 >>>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>> RIP: 0033:0x7f02cf043219 >>>> >>>> I've seen this one before as well, it's not a new thing. As far as I can >>>> tell, it's a false positive. There should be no possibility for a >>>> use-after-free iterating the static tags/requests. >>> >>> Are you sure this is a false positive? >> >> No I'm not, but the few times I have seen it, I haven't been able to >> make much sense of it. It goes back quite a bit. >> >> I have not yet encountered any false >>> positive KASAN complaints. According to the following gdb output this complaint >>> refers to reading rq->q: >>> >>> (gdb) list *(bt_iter+0x86) >>> 0xffffffff816b9346 is in bt_iter (block/blk-mq-tag.c:237). >>> 232 >>> 233 /* >>> 234 * We can hit rq == NULL here, because the tagging functions >>> 235 * test and set the bit before assigning ->rqs[]. >>> 236 */ >>> 237 if (rq && rq->q == hctx->queue) >>> 238 iter_data->fn(hctx, rq, iter_data->data, reserved); >>> 239 return true; >>> 240 } >>> 241 >>> >>> From the disassembly output: >>> >>> 232 >>> 233 /* >>> 234 * We can hit rq == NULL here, because the tagging functions >>> 235 * test and set the bit before assigning ->rqs[]. >>> 236 */ >>> 237 if (rq && rq->q == hctx->queue) >>> 0xffffffff816b9339 <+121>: test %r12,%r12 >>> 0xffffffff816b933c <+124>: je 0xffffffff816b935f <bt_iter+159> >>> 0xffffffff816b933e <+126>: mov %r12,%rdi >>> 0xffffffff816b9341 <+129>: callq 0xffffffff813bd3e0 <__asan_load8> >>> 0xffffffff816b9346 <+134>: lea 0x138(%r13),%rdi >>> 0xffffffff816b934d <+141>: mov (%r12),%r14 >>> 0xffffffff816b9351 <+145>: callq 0xffffffff813bd3e0 <__asan_load8> >>> 0xffffffff816b9356 <+150>: cmp 0x138(%r13),%r14 >>> 0xffffffff816b935d <+157>: je 0xffffffff816b936f <bt_iter+175> >>> >>> BTW, rq may but does not have to refer to tags->static_rqs[...]. It may also >>> refer to hctx->fq.flush_rq. >> >> But even those are persistent for the lifetime of the queue... But since >> kasan complains it belongs to a specific page, I'm guessing it's one >> of the regular requests since those are out of a chopped up page. Which >> means it makes even less sense. >> >> Is this happening while devices are being actively torn down? And >> are you using shared tags? That's the only way I could see this >> triggering. >> > > Or could it be caused by the stale request in hctx->tags->rqs[] slot ? > We don't clear it after free the requests. > > And there could be a scenario like, > There used to be a io scheduler attached. > After some workload, the io scheduler is detached. > So there could be rqs allocated by the io scheduler left in hctx->tags->rqs. > > blk_mq_get_request blk_mq_queue_tag_busy_iter > -> blk_mq_get_tag > -> bt_for_each > -> bt_iter > -> rq = taags->rqs[] > -> rq->q > -> blk_mq_rq_ctx_init > -> data->hctx->tags->rqs[rq->tag] = rq; > > If the scenario is possible, maybe we could fix it as following. Ah yes, good point, I bet that's what it is. But we just had this exact discussion in another thread, and my point there was that we should clear these when they go away, not inline. So how about clearing entries when the sched tags go away? -- Jens Axboe