Re: v4.20-rc6: Sporadic use-after-free in bt_iter()

Jens Axboe <axboe@xxxxxxxxx> · Wed, 19 Dec 2018 21:48:51 -0700

On 12/19/18 9:32 PM, jianchao.wang wrote:
> 
> 
> On 12/20/18 12:19 PM, Jens Axboe wrote:
>> On 12/19/18 8:24 PM, jianchao.wang wrote:
>>>
>>>
>>> On 12/20/18 11:17 AM, Jens Axboe wrote:
>>>> On 12/19/18 5:16 PM, Bart Van Assche wrote:
>>>>> On Wed, 2018-12-19 at 16:27 -0700, Jens Axboe wrote:
>>>>>> On 12/19/18 4:24 PM, Bart Van Assche wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> If I run the srp blktests in a loop then I see the below call stack appearing
>>>>>>> sporadically. I have not yet had the time to analyze this but I'm reporting
>>>>>>> this here in case someone else would already have had a look at this.
>>>>>>>
>>>>>>> Bart.
>>>>>>>
>>>>>>> ==================================================================
>>>>>>> BUG: KASAN: use-after-free in bt_iter+0x86/0xf0
>>>>>>> Read of size 8 at addr ffff88803b335240 by task fio/21412
>>>>>>>
>>>>>>> CPU: 0 PID: 21412 Comm: fio Tainted: G        W         4.20.0-rc6-dbg+ #3
>>>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>>>>> Call Trace:
>>>>>>>  dump_stack+0x86/0xca
>>>>>>>  print_address_description+0x71/0x239
>>>>>>>  kasan_report.cold.5+0x242/0x301
>>>>>>>  __asan_load8+0x54/0x90
>>>>>>>  bt_iter+0x86/0xf0
>>>>>>>  blk_mq_queue_tag_busy_iter+0x373/0x5e0
>>>>>>>  blk_mq_in_flight+0x96/0xb0
>>>>>>>  part_in_flight+0x40/0x140
>>>>>>>  part_round_stats+0x18e/0x370
>>>>>>>  blk_account_io_start+0x3d7/0x670
>>>>>>>  blk_mq_bio_to_request+0x19c/0x3a0
>>>>>>>  blk_mq_make_request+0x7a9/0xcb0
>>>>>>>  generic_make_request+0x41d/0x960
>>>>>>>  submit_bio+0x9b/0x250
>>>>>>>  do_blockdev_direct_IO+0x435c/0x4c70
>>>>>>>  __blockdev_direct_IO+0x79/0x88
>>>>>>>  ext4_direct_IO+0x46c/0xc00
>>>>>>>  generic_file_direct_write+0x119/0x210
>>>>>>>  __generic_file_write_iter+0x11c/0x280
>>>>>>>  ext4_file_write_iter+0x1b8/0x6f0
>>>>>>>  aio_write+0x204/0x310
>>>>>>>  io_submit_one+0x9d3/0xe80
>>>>>>>  __x64_sys_io_submit+0x115/0x340
>>>>>>>  do_syscall_64+0x71/0x210
>>>>>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>>> RIP: 0033:0x7f02cf043219
>>>>>>
>>>>>> I've seen this one before as well, it's not a new thing. As far as I can
>>>>>> tell, it's a false positive. There should be no possibility for a
>>>>>> use-after-free iterating the static tags/requests.
>>>>>
>>>>> Are you sure this is a false positive?
>>>>
>>>> No I'm not, but the few times I have seen it, I haven't been able to
>>>> make much sense of it. It goes back quite a bit.
>>>>
>>>> I have not yet encountered any false
>>>>> positive KASAN complaints. According to the following gdb output this complaint
>>>>> refers to reading rq->q:
>>>>>
>>>>> (gdb) list *(bt_iter+0x86)
>>>>> 0xffffffff816b9346 is in bt_iter (block/blk-mq-tag.c:237).
>>>>> 232
>>>>> 233             /*
>>>>> 234              * We can hit rq == NULL here, because the tagging functions
>>>>> 235              * test and set the bit before assigning ->rqs[].
>>>>> 236              */
>>>>> 237             if (rq && rq->q == hctx->queue)
>>>>> 238                     iter_data->fn(hctx, rq, iter_data->data, reserved);
>>>>> 239             return true;
>>>>> 240     }
>>>>> 241
>>>>>
>>>>> From the disassembly output:
>>>>>
>>>>> 232
>>>>> 233             /*
>>>>> 234              * We can hit rq == NULL here, because the tagging functions
>>>>> 235              * test and set the bit before assigning ->rqs[].
>>>>> 236              */
>>>>> 237             if (rq && rq->q == hctx->queue)
>>>>>    0xffffffff816b9339 <+121>:   test   %r12,%r12
>>>>>    0xffffffff816b933c <+124>:   je     0xffffffff816b935f <bt_iter+159>
>>>>>    0xffffffff816b933e <+126>:   mov    %r12,%rdi
>>>>>    0xffffffff816b9341 <+129>:   callq  0xffffffff813bd3e0 <__asan_load8>
>>>>>    0xffffffff816b9346 <+134>:   lea    0x138(%r13),%rdi
>>>>>    0xffffffff816b934d <+141>:   mov    (%r12),%r14
>>>>>    0xffffffff816b9351 <+145>:   callq  0xffffffff813bd3e0 <__asan_load8>
>>>>>    0xffffffff816b9356 <+150>:   cmp    0x138(%r13),%r14
>>>>>    0xffffffff816b935d <+157>:   je     0xffffffff816b936f <bt_iter+175>
>>>>>
>>>>> BTW, rq may but does not have to refer to tags->static_rqs[...]. It may also
>>>>> refer to hctx->fq.flush_rq.
>>>>
>>>> But even those are persistent for the lifetime of the queue... But since
>>>> kasan complains it belongs to a specific page, I'm guessing it's one
>>>> of the regular requests since those are out of a chopped up page. Which
>>>> means it makes even less sense.
>>>>
>>>> Is this happening while devices are being actively torn down? And
>>>> are you using shared tags? That's the only way I could see this
>>>> triggering.
>>>>
>>>
>>> Or could it be caused by the stale request in hctx->tags->rqs[] slot ?
>>> We don't clear it after free the requests.
>>>
>>> And there could be a scenario like,
>>> There used to be a io scheduler attached.
>>> After some workload, the io scheduler is detached.
>>> So there could be rqs allocated by the io scheduler left in hctx->tags->rqs.
>>>
>>> blk_mq_get_request                            blk_mq_queue_tag_busy_iter
>>>   -> blk_mq_get_tag
>>>                                                 -> bt_for_each
>>>                                                   -> bt_iter
>>>                                                     -> rq = taags->rqs[]
>>>                                                     -> rq->q
>>>   -> blk_mq_rq_ctx_init
>>>     -> data->hctx->tags->rqs[rq->tag] = rq;
>>>
>>> If the scenario is possible, maybe we could fix it as following.
>>
>> Ah yes, good point, I bet that's what it is. But we just had this exact
>> discussion in another thread, and my point there was that we should
>> clear these when they go away, not inline. So how about clearing entries
>> when the sched tags go away?
>>
> I guess it should be OK. :)

Something like this. Totally untested... And I wonder if there's a more
efficient way to do this, not that it matters THAT much. But still.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2de972857496..341cb8b9cfb7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2025,16 +2025,21 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 {
 	struct page *page;
 
-	if (tags->rqs && set->ops->exit_request) {
-		int i;
+	if (tags->rqs) {
+		int i, j;
 
 		for (i = 0; i < tags->nr_tags; i++) {
 			struct request *rq = tags->static_rqs[i];
 
 			if (!rq)
 				continue;
-			set->ops->exit_request(set, rq, hctx_idx);
+			if (set->ops->exit_request)
+				set->ops->exit_request(set, rq, hctx_idx);
 			tags->static_rqs[i] = NULL;
+
+			for (j = 0; j < tags->nr_tags; j++)
+				if (tags->rqs[j] == rq)
+					tags->rqs[j] = NULL;
 		}
 	}
 

-- 
Jens Axboe