[RFC 0/2] io_uring: examine request result only after completion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Running an fio test consistenly crashes the kernel with the trace included
below.  The root cause seems to be the code in __io_submit_sqe() that
checks the result of a request for -EAGAIN in polled mode, without
ensuring first that the request has completed:

	if (ctx->flags & IORING_SETUP_IOPOLL) {
		if (req->result == -EAGAIN)
			return -EAGAIN;

The request will be immediately resubmitted in io_sq_wq_submit_work(),
potentially before the the fisrt submission has completed.  This creates
a race where the original completion may set REQ_F_IOPOLL_COMPLETED in
a freed submission request, overwriting the poisoned bits, casusing the
panic below.

	do {
		ret = __io_submit_sqe(ctx, req, s, false);
		/*
		 * We can get EAGAIN for polled IO even though
		 * we're forcing a sync submission from here,
		 * since we can't wait for request slots on the
		 * block side.
		 */
		if (ret != -EAGAIN)
			break;
		cond_resched();
	} while (1);

The suggested fix is to move a submitted request to the poll list
unconditionally in polled mode.  The request can then be retried if
necessary once the original submission has indeed completed.

This bug raises an issue however since REQ_F_IOPOLL_COMPLETED is set
in io_complete_rw_iopoll() from interrupt context.  NVMe polled queues
however are not supposed to generate interrupts so it is not clear what
is the reason for this apparent inconsitency.

fio job
-------
[global]
filename=/dev/nvme0n1
rw=randread
bs=4k
size=4G
direct=1
time_based=1
runtime=60
randrepeat=1
gtod_reduce=1

fio test
--------
fio nvme.fio --readonly --ioengine=io_uring --iodepth 1024 --fixedbufs --hipri --numjobs=8

panic trace
-----------
[  450.395076] BUG io_kiocb (Not tainted): Poison overwritten
[  450.537797] -----------------------------------------------------------------------------
[  450.537799] INFO: 0x00000000cb333e0b-0x00000000cb333e0b. First byte 0x7b instead of 0x6b
[  450.656496] RIP: 0010:blkdev_bio_end_io+0x71/0xe0
[  450.772066] INFO: Allocated in io_submit_sqe+0x84/0x3d0 age=555 cpu=9 pid=3665
[  450.772070]  __slab_alloc+0x40/0x62
[  450.868914] Code: 75 3c 0f b6 43 32 4c 8b 2b 84 c0 75 0a 48 8b 73 08 49 01 75 08 eb 0b 0f b6 f8 e8 aa 9c 0e 00 48 63 f0 48 8b 03 31 d2 4c 89 ef <ff> 50 10 f6 43 14 01 74 32 48 8d 7b 18 e8 0d 56 0e 00 eb 27 48 8b
[  450.925197]  kmem_cache_alloc+0xa3/0x260
[  450.925198]  io_submit_sqe+0x84/0x3d0
[  451.011642] RSP: 0018:ffffc90006908e28 EFLAGS: 00010046
[  451.053353]  io_ring_submit+0xd5/0x150
[  451.053355]  __x64_sys_io_uring_enter+0x14e/0x290

Bijan Mottahedeh (2):
  io_uring: create io_queue_async() function
  io_uring: examine request result only after completion

 fs/io_uring.c | 122 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 89 insertions(+), 33 deletions(-)

-- 
1.8.3.1




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux