On 12/20/18 11:17 AM, Jens Axboe wrote: > On 12/19/18 5:16 PM, Bart Van Assche wrote: >> On Wed, 2018-12-19 at 16:27 -0700, Jens Axboe wrote: >>> On 12/19/18 4:24 PM, Bart Van Assche wrote: >>>> Hello, >>>> >>>> If I run the srp blktests in a loop then I see the below call stack appearing >>>> sporadically. I have not yet had the time to analyze this but I'm reporting >>>> this here in case someone else would already have had a look at this. >>>> >>>> Bart. >>>> >>>> ================================================================== >>>> BUG: KASAN: use-after-free in bt_iter+0x86/0xf0 >>>> Read of size 8 at addr ffff88803b335240 by task fio/21412 >>>> >>>> CPU: 0 PID: 21412 Comm: fio Tainted: G W 4.20.0-rc6-dbg+ #3 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 >>>> Call Trace: >>>> dump_stack+0x86/0xca >>>> print_address_description+0x71/0x239 >>>> kasan_report.cold.5+0x242/0x301 >>>> __asan_load8+0x54/0x90 >>>> bt_iter+0x86/0xf0 >>>> blk_mq_queue_tag_busy_iter+0x373/0x5e0 >>>> blk_mq_in_flight+0x96/0xb0 >>>> part_in_flight+0x40/0x140 >>>> part_round_stats+0x18e/0x370 >>>> blk_account_io_start+0x3d7/0x670 >>>> blk_mq_bio_to_request+0x19c/0x3a0 >>>> blk_mq_make_request+0x7a9/0xcb0 >>>> generic_make_request+0x41d/0x960 >>>> submit_bio+0x9b/0x250 >>>> do_blockdev_direct_IO+0x435c/0x4c70 >>>> __blockdev_direct_IO+0x79/0x88 >>>> ext4_direct_IO+0x46c/0xc00 >>>> generic_file_direct_write+0x119/0x210 >>>> __generic_file_write_iter+0x11c/0x280 >>>> ext4_file_write_iter+0x1b8/0x6f0 >>>> aio_write+0x204/0x310 >>>> io_submit_one+0x9d3/0xe80 >>>> __x64_sys_io_submit+0x115/0x340 >>>> do_syscall_64+0x71/0x210 >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>> RIP: 0033:0x7f02cf043219 >>> >>> I've seen this one before as well, it's not a new thing. As far as I can >>> tell, it's a false positive. There should be no possibility for a >>> use-after-free iterating the static tags/requests. >> >> Are you sure this is a false positive? > > No I'm not, but the few times I have seen it, I haven't been able to > make much sense of it. It goes back quite a bit. > > I have not yet encountered any false >> positive KASAN complaints. According to the following gdb output this complaint >> refers to reading rq->q: >> >> (gdb) list *(bt_iter+0x86) >> 0xffffffff816b9346 is in bt_iter (block/blk-mq-tag.c:237). >> 232 >> 233 /* >> 234 * We can hit rq == NULL here, because the tagging functions >> 235 * test and set the bit before assigning ->rqs[]. >> 236 */ >> 237 if (rq && rq->q == hctx->queue) >> 238 iter_data->fn(hctx, rq, iter_data->data, reserved); >> 239 return true; >> 240 } >> 241 >> >> From the disassembly output: >> >> 232 >> 233 /* >> 234 * We can hit rq == NULL here, because the tagging functions >> 235 * test and set the bit before assigning ->rqs[]. >> 236 */ >> 237 if (rq && rq->q == hctx->queue) >> 0xffffffff816b9339 <+121>: test %r12,%r12 >> 0xffffffff816b933c <+124>: je 0xffffffff816b935f <bt_iter+159> >> 0xffffffff816b933e <+126>: mov %r12,%rdi >> 0xffffffff816b9341 <+129>: callq 0xffffffff813bd3e0 <__asan_load8> >> 0xffffffff816b9346 <+134>: lea 0x138(%r13),%rdi >> 0xffffffff816b934d <+141>: mov (%r12),%r14 >> 0xffffffff816b9351 <+145>: callq 0xffffffff813bd3e0 <__asan_load8> >> 0xffffffff816b9356 <+150>: cmp 0x138(%r13),%r14 >> 0xffffffff816b935d <+157>: je 0xffffffff816b936f <bt_iter+175> >> >> BTW, rq may but does not have to refer to tags->static_rqs[...]. It may also >> refer to hctx->fq.flush_rq. > > But even those are persistent for the lifetime of the queue... But since > kasan complains it belongs to a specific page, I'm guessing it's one > of the regular requests since those are out of a chopped up page. Which > means it makes even less sense. > > Is this happening while devices are being actively torn down? And > are you using shared tags? That's the only way I could see this > triggering. > Or could it be caused by the stale request in hctx->tags->rqs[] slot ? We don't clear it after free the requests. And there could be a scenario like, There used to be a io scheduler attached. After some workload, the io scheduler is detached. So there could be rqs allocated by the io scheduler left in hctx->tags->rqs. blk_mq_get_request blk_mq_queue_tag_busy_iter -> blk_mq_get_tag -> bt_for_each -> bt_iter -> rq = taags->rqs[] -> rq->q -> blk_mq_rq_ctx_init -> data->hctx->tags->rqs[rq->tag] = rq; If the scenario is possible, maybe we could fix it as following. --- block/blk-mq.c | 1 + block/blk-mq.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index 6a75662..ad55226 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -477,6 +477,7 @@ static void __blk_mq_free_request(struct request *rq) const int sched_tag = rq->internal_tag; blk_pm_mark_last_busy(rq); + hctx->tags->rqs[rq->tag] = NULL; if (rq->tag != -1) blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag); if (sched_tag != -1) diff --git a/block/blk-mq.h b/block/blk-mq.h index 9497b47..675b681 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -190,6 +190,7 @@ static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx, if (rq->tag == -1 || rq->internal_tag == -1) return; + hctx->tags->rqs[rq->tag] = NULL; __blk_mq_put_driver_tag(hctx, rq); } @@ -201,6 +202,7 @@ static inline void blk_mq_put_driver_tag(struct request *rq) return; hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu); + hctx->tags->rqs[rq->tag] = NULL; __blk_mq_put_driver_tag(hctx, rq); } -- 2.7.4 Thanks Jianchao