On 2/11/20 12:30 PM, Wunderlich, Mark wrote: > Posting to this mail list in hopes someone has already seen this fault before I start digging. Using the nvme-5.5-rc branch of git.infradead.org repo. > Pulled this branch and running un-modified. > Performing FIO (io_uring) test: (initiating on 8 host cores, TIME=30, RWMIX=100, BLOCK_SIZE=4k, DEPTH=32, BATCH=8), using latest version of fio. > cmd="fio --filename=/dev/nvme0n1 --time_based --runtime=$TIME --ramp_time=10 --thread --rw=randrw --rwmixread=$RWMIX --refill_buffers --direct=1 --ioengine=io_uring --hipri --fixedbufs --bs=$BLOCK_SIZE --iodepth=$DEPTH --iodepth_batch_complete_min=1 --iodepth_batch_complete_max=$DEPTH --iodepth_batch=$BATCH --numjobs=1 --group_reporting --gtod_reduce=0 --disable_lat=0 --name=cpu3 --cpus_allowed=3 --name=cpu5 --cpus_allowed=5 --name=cpu7 --cpus_allowed=7 --name=cpu9 --cpus_allowed=9 --name=cpu11 --cpus_allowed11 --name=cpu13 --cpus_allowed=13 --name=cpu15 --cpus_allowed=15 --name=cpu17 --cpus_allowed=17 > > NVMf TCP queue configuration is 1 default queue and 101 poll queues. Connected to a single remote NVMe ram disk device. > I/O performs normally up to 30 second run, but faults just at the end. Very repeatable. > > Thanks for your time --- Mark > > [64592.841944] nvme nvme0: mapped 1/0/101 default/read/poll queues. > [64592.867003] nvme nvme0: new ctrl: NQN "testrd", addr 192.168.0.1:4420 > [64646.940588] list_add corruption. prev->next should be next (ffff9c1feb2bc7c8), but was ffff9c1ff7ee5368. (prev=ffff9c1ff7ee5468). > [64646.941149] ------------[ cut here ]------------ > [64646.941150] kernel BUG at lib/list_debug.c:28! > [64646.941360] invalid opcode: 0000 [#1] SMP PTI > [64646.941561] CPU: 82 PID: 7886 Comm: io_wqe_worker-0 Tainted: G O 5.5.0-rc2stable+ #32 > [64646.941994] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 1.4.9 06/29/2018 > [64646.942349] RIP: 0010:__list_add_valid+0x64/0x70 > [64646.942562] Code: 48 89 fe 31 c0 48 c7 c7 40 21 17 89 e8 f9 5c c6 ff 0f 0b 48 89 d1 48 c7 c7 e8 20 17 89 48 89 f2 48 89 c6 31 c0 e8 e0 5c c6 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 b9 00 01 00 00 00 > [64646.943442] RSP: 0018:ffffa78a49137d90 EFLAGS: 00010246 > [64646.943687] RAX: 0000000000000075 RBX: ffff9c1ff7ee5a00 RCX: 0000000000000000 > [64646.944021] RDX: 0000000000000000 RSI: ffff9c0fffe59d28 RDI: ffff9c0fffe59d28 > [64646.944356] RBP: ffffa78a49137df8 R08: 00000000000006ad R09: ffffffff88ec3be0 > [64646.944691] R10: 000000000000000f R11: 0000000007070707 R12: ffff9c1feb2bc600 > [64646.945025] R13: ffff9c1feb2bc7c8 R14: ffff9c1ff7ee5468 R15: ffff9c1ff7ee5a68 > [64646.945360] FS: 0000000000000000(0000) GS:ffff9c0fffe40000(0000) knlGS:0000000000000000 > [64646.945739] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [64646.946008] CR2: 00007f4423eb7004 CR3: 000000169940a005 CR4: 00000000007606e0 > [64646.946343] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [64646.946677] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [64646.947012] PKRU: 55555554 > [64646.947138] Call Trace: > [64646.947260] io_issue_sqe+0x115/0xa30 > [64646.947429] io_wq_submit_work+0xb5/0x1d0 > [64646.947615] io_worker_handle_work+0x19d/0x4c0 > [64646.947823] io_wqe_worker+0xdc/0x390 > [64646.947998] kthread+0xf8/0x130 > [64646.948141] ? io_wq_for_each_worker+0xb0/0xb0 > [64646.948349] ? kthread_bind+0x10/0x10 > [64646.948522] ret_from_fork+0x35/0x40 I think you want to check that you have these in your tree: commit 11ba820bf163e224bf5dd44e545a66a44a5b1d7a Author: Jens Axboe <axboe@xxxxxxxxx> Date: Wed Jan 15 21:51:17 2020 -0700 io_uring: ensure workqueue offload grabs ring mutex for poll list and commit 797f3f535d59f05ad12c629338beef6cb801d19e Author: Bijan Mottahedeh <bijan.mottahedeh@xxxxxxxxxx> Date: Wed Jan 15 18:37:45 2020 -0800 io_uring: clear req->result always before issuing a read/write request -- Jens Axboe