RE: Fault seen with io_uring and nvmf/tcp

"Wunderlich, Mark" <mark.wunderlich@xxxxxxxxx> · Wed, 12 Feb 2020 00:44:05 +0000

Thanks Jens,

Imported those two commits, in addition to commit that reintroduced helper io_wq_current_is_worker() used by one of them.
Re-tested this base and no longer see the failure.  Awesome!

Cheers --- Mark

Date: Tue, 17 Dec 2019 14:13:37 -0700
Subject: [PATCH] io-wq: re-add io_wq_current_is_worker()

This reverts commit 8cdda87a4414, we now have several use csaes for this
helper. Reinstate it.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

-----Original Message-----
From: Jens Axboe <axboe@xxxxxxxxx> 
Sent: Tuesday, February 11, 2020 11:45 AM
To: Wunderlich, Mark <mark.wunderlich@xxxxxxxxx>; linux-block@xxxxxxxxxxxxxxx
Cc: Sagi Grimberg <sagi@xxxxxxxxxxx>
Subject: Re: Fault seen with io_uring and nvmf/tcp

On 2/11/20 12:30 PM, Wunderlich, Mark wrote:
> Posting to this mail list in hopes someone has already seen this fault before I start digging.  Using the nvme-5.5-rc branch of  git.infradead.org repo.
> Pulled this branch and running un-modified.
> Performing FIO (io_uring) test: (initiating on 8 host cores, TIME=30, RWMIX=100, BLOCK_SIZE=4k, DEPTH=32, BATCH=8), using latest version of fio.
> cmd="fio --filename=/dev/nvme0n1 --time_based --runtime=$TIME 
> --ramp_time=10 --thread --rw=randrw --rwmixread=$RWMIX 
> --refill_buffers --direct=1 --ioengine=io_uring --hipri --fixedbufs 
> --bs=$BLOCK_SIZE --iodepth=$DEPTH --iodepth_batch_complete_min=1 
> --iodepth_batch_complete_max=$DEPTH --iodepth_batch=$BATCH --numjobs=1 
> --group_reporting --gtod_reduce=0 --disable_lat=0 --name=cpu3 
> --cpus_allowed=3 --name=cpu5 --cpus_allowed=5 --name=cpu7 
> --cpus_allowed=7 --name=cpu9 --cpus_allowed=9 --name=cpu11 
> --cpus_allowed11 --name=cpu13 --cpus_allowed=13 --name=cpu15 
> --cpus_allowed=15 --name=cpu17 --cpus_allowed=17
> 
> NVMf TCP queue configuration is 1 default queue and 101 poll queues.  Connected to a single remote NVMe ram disk device.
> I/O performs normally up to 30 second run, but faults just at the end. Very repeatable.
> 
> Thanks for your time --- Mark
> 
> [64592.841944] nvme nvme0: mapped 1/0/101 default/read/poll queues.
> [64592.867003] nvme nvme0: new ctrl: NQN "testrd", addr 
> 192.168.0.1:4420 [64646.940588] list_add corruption. prev->next should be next (ffff9c1feb2bc7c8), but was ffff9c1ff7ee5368. (prev=ffff9c1ff7ee5468).
> [64646.941149] ------------[ cut here ]------------ [64646.941150] 
> kernel BUG at lib/list_debug.c:28!
> [64646.941360] invalid opcode: 0000 [#1] SMP PTI
> [64646.941561] CPU: 82 PID: 7886 Comm: io_wqe_worker-0 Tainted: G           O      5.5.0-rc2stable+ #32
> [64646.941994] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 
> 1.4.9 06/29/2018 [64646.942349] RIP: 0010:__list_add_valid+0x64/0x70 
> [64646.942562] Code: 48 89 fe 31 c0 48 c7 c7 40 21 17 89 e8 f9 5c c6 
> ff 0f 0b 48 89 d1 48 c7 c7 e8 20 17 89 48 89 f2 48 89 c6 31 c0 e8 e0 
> 5c c6 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 b9 00 01 00 
> 00 00 [64646.943442] RSP: 0018:ffffa78a49137d90 EFLAGS: 00010246 
> [64646.943687] RAX: 0000000000000075 RBX: ffff9c1ff7ee5a00 RCX: 
> 0000000000000000 [64646.944021] RDX: 0000000000000000 RSI: 
> ffff9c0fffe59d28 RDI: ffff9c0fffe59d28 [64646.944356] RBP: 
> ffffa78a49137df8 R08: 00000000000006ad R09: ffffffff88ec3be0 
> [64646.944691] R10: 000000000000000f R11: 0000000007070707 R12: 
> ffff9c1feb2bc600 [64646.945025] R13: ffff9c1feb2bc7c8 R14: 
> ffff9c1ff7ee5468 R15: ffff9c1ff7ee5a68 [64646.945360] FS:  
> 0000000000000000(0000) GS:ffff9c0fffe40000(0000) 
> knlGS:0000000000000000 [64646.945739] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033 [64646.946008] CR2: 00007f4423eb7004 CR3: 000000169940a005 CR4: 00000000007606e0 [64646.946343] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [64646.946677] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [64646.947012] PKRU: 55555554 [64646.947138] Call Trace:
> [64646.947260]  io_issue_sqe+0x115/0xa30 [64646.947429]  
> io_wq_submit_work+0xb5/0x1d0 [64646.947615]  
> io_worker_handle_work+0x19d/0x4c0 [64646.947823]  
> io_wqe_worker+0xdc/0x390 [64646.947998]  kthread+0xf8/0x130 
> [64646.948141]  ? io_wq_for_each_worker+0xb0/0xb0 [64646.948349]  ? 
> kthread_bind+0x10/0x10 [64646.948522]  ret_from_fork+0x35/0x40

I think you want to check that you have these in your tree:

commit 11ba820bf163e224bf5dd44e545a66a44a5b1d7a
Author: Jens Axboe <axboe@xxxxxxxxx>
Date:   Wed Jan 15 21:51:17 2020 -0700

    io_uring: ensure workqueue offload grabs ring mutex for poll list

and

commit 797f3f535d59f05ad12c629338beef6cb801d19e
Author: Bijan Mottahedeh <bijan.mottahedeh@xxxxxxxxxx>
Date:   Wed Jan 15 18:37:45 2020 -0800

    io_uring: clear req->result always before issuing a read/write request

--
Jens Axboe