On Tue, 2018-12-11 at 17:02 -0700, Jens Axboe wrote: +AD4 On 12/11/18 3:58 PM, Bart Van Assche wrote: +AD4 +AD4 Hi Jens, +AD4 +AD4 +AD4 +AD4 If I run the following subset of blktests: +AD4 +AD4 +AD4 +AD4 while :+ADs do ./check -q srp +ACYAJg ./check -q nvmeof-mp+ADs done +AD4 +AD4 +AD4 +AD4 against today's for-next branch (commit dd2bf2df85a7) then after some +AD4 +AD4 time the following hang is reported: +AD4 +AD4 +AD4 +AD4 INFO: task fio:14869 blocked for more than 120 seconds. +AD4 +AD4 Not tainted 4.20.0-rc6-dbg+- +ACM-1 +AD4 +AD4 +ACI-echo 0 +AD4 /proc/sys/kernel/hung+AF8-task+AF8-timeout+AF8-secs+ACI disables this message. +AD4 +AD4 fio D25272 14869 14195 0x00000000 +AD4 +AD4 Call Trace: +AD4 +AD4 +AF8AXw-schedule+-0x401/0xe50 +AD4 +AD4 schedule+-0x4e/0xd0 +AD4 +AD4 io+AF8-schedule+-0x21/0x50 +AD4 +AD4 blk+AF8-mq+AF8-get+AF8-tag+-0x46d/0x640 +AD4 +AD4 blk+AF8-mq+AF8-get+AF8-request+-0x7c0/0xa00 +AD4 +AD4 blk+AF8-mq+AF8-make+AF8-request+-0x241/0xa70 +AD4 +AD4 generic+AF8-make+AF8-request+-0x411/0x950 +AD4 +AD4 submit+AF8-bio+-0x9b/0x250 +AD4 +AD4 blkdev+AF8-direct+AF8-IO+-0x7fb/0x870 +AD4 +AD4 generic+AF8-file+AF8-direct+AF8-write+-0x119/0x210 +AD4 +AD4 +AF8AXw-generic+AF8-file+AF8-write+AF8-iter+-0x11c/0x280 +AD4 +AD4 blkdev+AF8-write+AF8-iter+-0x13c/0x220 +AD4 +AD4 aio+AF8-write+-0x204/0x310 +AD4 +AD4 io+AF8-submit+AF8-one+-0x9c6/0xe70 +AD4 +AD4 +AF8AXw-x64+AF8-sys+AF8-io+AF8-submit+-0x115/0x340 +AD4 +AD4 do+AF8-syscall+AF8-64+-0x71/0x210 +AD4 +AD4 entry+AF8-SYSCALL+AF8-64+AF8-after+AF8-hwframe+-0x49/0xbe +AD4 +AD4 +AD4 +AD4 When that hang occurs my list-pending-block-requests script does not show +AD4 +AD4 any pending requests: +AD4 +AD4 +AD4 +AD4 +ACM list-pending-block-requests +AD4 +AD4 dm-0 +AD4 +AD4 loop0 +AD4 +AD4 loop1 +AD4 +AD4 loop2 +AD4 +AD4 loop3 +AD4 +AD4 loop4 +AD4 +AD4 loop5 +AD4 +AD4 loop6 +AD4 +AD4 loop7 +AD4 +AD4 nullb0 +AD4 +AD4 nullb1 +AD4 +AD4 sda +AD4 +AD4 sdb +AD4 +AD4 sdc +AD4 +AD4 sdd +AD4 +AD4 vda +AD4 +AD4 vdb +AD4 +AD4 +AD4 +AD4 Enabling fail+AF8-if+AF8-no+AF8-path mode did not resolve the hang so I don't think +AD4 +AD4 that the root cause is in any of the dm drivers used in this test: +AD4 +AD4 +AD4 +AD4 +ACM dmsetup ls +AHw while read dm rest+ADs do dmsetup message +ACQ-dm 0 fail+AF8-if+AF8-no+AF8-path+ADs done+ADs dmsetup remove+AF8-all+ADs dmsetup table +AD4 +AD4 360014056e756c6c62300000000000000: 0 65536 multipath 0 1 alua 1 1 service-time 0 1 2 8:16 1 1 +AD4 +AD4 +AD4 +AD4 The same test passes against kernel v4.20-rc6. +AD4 +AD4 What device is this being run on? Older versions of the srp and nvmeof-mp tests used the brd block device. Today these tests use null+AF8-blk with memory+AF8-backed set to 1. See also configure+AF8-null+AF8-blk() in common/multipath-over-rdma. null+AF8-blk is accessed by ib+AF8-srpt. The dm-mpath driver is stacked on top of the ib+AF8-srp instance that communicates with ib+AF8-srpt driver. The ib+AF8-srp and ib+AF8-srpt drivers communicate with each other over the loopback functionality of the rdma+AF8-rxe driver. Bart.