Re: v4.20-rc6: Sporadic use-after-free in bt_iter()

"jianchao.wang" <jianchao.w.wang@xxxxxxxxxx> · Thu, 20 Dec 2018 11:24:40 +0800

On 12/20/18 11:17 AM, Jens Axboe wrote:
> On 12/19/18 5:16 PM, Bart Van Assche wrote:
>> On Wed, 2018-12-19 at 16:27 -0700, Jens Axboe wrote:
>>> On 12/19/18 4:24 PM, Bart Van Assche wrote:
>>>> Hello,
>>>>
>>>> If I run the srp blktests in a loop then I see the below call stack appearing
>>>> sporadically. I have not yet had the time to analyze this but I'm reporting
>>>> this here in case someone else would already have had a look at this.
>>>>
>>>> Bart.
>>>>
>>>> ==================================================================
>>>> BUG: KASAN: use-after-free in bt_iter+0x86/0xf0
>>>> Read of size 8 at addr ffff88803b335240 by task fio/21412
>>>>
>>>> CPU: 0 PID: 21412 Comm: fio Tainted: G        W         4.20.0-rc6-dbg+ #3
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>> Call Trace:
>>>>  dump_stack+0x86/0xca
>>>>  print_address_description+0x71/0x239
>>>>  kasan_report.cold.5+0x242/0x301
>>>>  __asan_load8+0x54/0x90
>>>>  bt_iter+0x86/0xf0
>>>>  blk_mq_queue_tag_busy_iter+0x373/0x5e0
>>>>  blk_mq_in_flight+0x96/0xb0
>>>>  part_in_flight+0x40/0x140
>>>>  part_round_stats+0x18e/0x370
>>>>  blk_account_io_start+0x3d7/0x670
>>>>  blk_mq_bio_to_request+0x19c/0x3a0
>>>>  blk_mq_make_request+0x7a9/0xcb0
>>>>  generic_make_request+0x41d/0x960
>>>>  submit_bio+0x9b/0x250
>>>>  do_blockdev_direct_IO+0x435c/0x4c70
>>>>  __blockdev_direct_IO+0x79/0x88
>>>>  ext4_direct_IO+0x46c/0xc00
>>>>  generic_file_direct_write+0x119/0x210
>>>>  __generic_file_write_iter+0x11c/0x280
>>>>  ext4_file_write_iter+0x1b8/0x6f0
>>>>  aio_write+0x204/0x310
>>>>  io_submit_one+0x9d3/0xe80
>>>>  __x64_sys_io_submit+0x115/0x340
>>>>  do_syscall_64+0x71/0x210
>>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>> RIP: 0033:0x7f02cf043219
>>>
>>> I've seen this one before as well, it's not a new thing. As far as I can
>>> tell, it's a false positive. There should be no possibility for a
>>> use-after-free iterating the static tags/requests.
>>
>> Are you sure this is a false positive?
> 
> No I'm not, but the few times I have seen it, I haven't been able to
> make much sense of it. It goes back quite a bit.
> 
> I have not yet encountered any false
>> positive KASAN complaints. According to the following gdb output this complaint
>> refers to reading rq->q:
>>
>> (gdb) list *(bt_iter+0x86)
>> 0xffffffff816b9346 is in bt_iter (block/blk-mq-tag.c:237).
>> 232
>> 233             /*
>> 234              * We can hit rq == NULL here, because the tagging functions
>> 235              * test and set the bit before assigning ->rqs[].
>> 236              */
>> 237             if (rq && rq->q == hctx->queue)
>> 238                     iter_data->fn(hctx, rq, iter_data->data, reserved);
>> 239             return true;
>> 240     }
>> 241
>>
>> From the disassembly output:
>>
>> 232
>> 233             /*
>> 234              * We can hit rq == NULL here, because the tagging functions
>> 235              * test and set the bit before assigning ->rqs[].
>> 236              */
>> 237             if (rq && rq->q == hctx->queue)
>>    0xffffffff816b9339 <+121>:   test   %r12,%r12
>>    0xffffffff816b933c <+124>:   je     0xffffffff816b935f <bt_iter+159>
>>    0xffffffff816b933e <+126>:   mov    %r12,%rdi
>>    0xffffffff816b9341 <+129>:   callq  0xffffffff813bd3e0 <__asan_load8>
>>    0xffffffff816b9346 <+134>:   lea    0x138(%r13),%rdi
>>    0xffffffff816b934d <+141>:   mov    (%r12),%r14
>>    0xffffffff816b9351 <+145>:   callq  0xffffffff813bd3e0 <__asan_load8>
>>    0xffffffff816b9356 <+150>:   cmp    0x138(%r13),%r14
>>    0xffffffff816b935d <+157>:   je     0xffffffff816b936f <bt_iter+175>
>>
>> BTW, rq may but does not have to refer to tags->static_rqs[...]. It may also
>> refer to hctx->fq.flush_rq.
> 
> But even those are persistent for the lifetime of the queue... But since
> kasan complains it belongs to a specific page, I'm guessing it's one
> of the regular requests since those are out of a chopped up page. Which
> means it makes even less sense.
> 
> Is this happening while devices are being actively torn down? And
> are you using shared tags? That's the only way I could see this
> triggering.
> 

Or could it be caused by the stale request in hctx->tags->rqs[] slot ?
We don't clear it after free the requests.

And there could be a scenario like,
There used to be a io scheduler attached.
After some workload, the io scheduler is detached.
So there could be rqs allocated by the io scheduler left in hctx->tags->rqs.

blk_mq_get_request                            blk_mq_queue_tag_busy_iter
  -> blk_mq_get_tag
                                                -> bt_for_each
                                                  -> bt_iter
                                                    -> rq = taags->rqs[]
                                                    -> rq->q
  -> blk_mq_rq_ctx_init
    -> data->hctx->tags->rqs[rq->tag] = rq;

If the scenario is possible, maybe we could fix it as following.


---
 block/blk-mq.c | 1 +
 block/blk-mq.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6a75662..ad55226 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -477,6 +477,7 @@ static void __blk_mq_free_request(struct request *rq)
 	const int sched_tag = rq->internal_tag;
 
 	blk_pm_mark_last_busy(rq);
+	hctx->tags->rqs[rq->tag] = NULL;
 	if (rq->tag != -1)
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9497b47..675b681 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -190,6 +190,7 @@ static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
 	if (rq->tag == -1 || rq->internal_tag == -1)
 		return;
 
+	hctx->tags->rqs[rq->tag] = NULL;
 	__blk_mq_put_driver_tag(hctx, rq);
 }
 
@@ -201,6 +202,7 @@ static inline void blk_mq_put_driver_tag(struct request *rq)
 		return;
 
 	hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
+	hctx->tags->rqs[rq->tag] = NULL;
 	__blk_mq_put_driver_tag(hctx, rq);
 }
 
-- 
2.7.4


Thanks
Jianchao