Re: for-next hangs on test srp/012

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2018-12-11 at 17:02 -0700, Jens Axboe wrote:
+AD4 On 12/11/18 3:58 PM, Bart Van Assche wrote:
+AD4 +AD4 Hi Jens,
+AD4 +AD4 
+AD4 +AD4 If I run the following subset of blktests:
+AD4 +AD4 
+AD4 +AD4   while :+ADs do ./check -q srp +ACYAJg ./check -q nvmeof-mp+ADs done
+AD4 +AD4 
+AD4 +AD4 against today's for-next branch (commit dd2bf2df85a7) then after some
+AD4 +AD4 time the following hang is reported:
+AD4 +AD4 
+AD4 +AD4 INFO: task fio:14869 blocked for more than 120 seconds.
+AD4 +AD4       Not tainted 4.20.0-rc6-dbg+- +ACM-1
+AD4 +AD4 +ACI-echo 0 +AD4 /proc/sys/kernel/hung+AF8-task+AF8-timeout+AF8-secs+ACI disables this message.
+AD4 +AD4 fio             D25272 14869  14195 0x00000000
+AD4 +AD4 Call Trace:
+AD4 +AD4  +AF8AXw-schedule+-0x401/0xe50
+AD4 +AD4  schedule+-0x4e/0xd0
+AD4 +AD4  io+AF8-schedule+-0x21/0x50
+AD4 +AD4  blk+AF8-mq+AF8-get+AF8-tag+-0x46d/0x640
+AD4 +AD4  blk+AF8-mq+AF8-get+AF8-request+-0x7c0/0xa00
+AD4 +AD4  blk+AF8-mq+AF8-make+AF8-request+-0x241/0xa70
+AD4 +AD4  generic+AF8-make+AF8-request+-0x411/0x950
+AD4 +AD4  submit+AF8-bio+-0x9b/0x250
+AD4 +AD4  blkdev+AF8-direct+AF8-IO+-0x7fb/0x870
+AD4 +AD4  generic+AF8-file+AF8-direct+AF8-write+-0x119/0x210
+AD4 +AD4  +AF8AXw-generic+AF8-file+AF8-write+AF8-iter+-0x11c/0x280
+AD4 +AD4  blkdev+AF8-write+AF8-iter+-0x13c/0x220
+AD4 +AD4  aio+AF8-write+-0x204/0x310
+AD4 +AD4  io+AF8-submit+AF8-one+-0x9c6/0xe70
+AD4 +AD4  +AF8AXw-x64+AF8-sys+AF8-io+AF8-submit+-0x115/0x340
+AD4 +AD4  do+AF8-syscall+AF8-64+-0x71/0x210
+AD4 +AD4  entry+AF8-SYSCALL+AF8-64+AF8-after+AF8-hwframe+-0x49/0xbe
+AD4 +AD4 
+AD4 +AD4 When that hang occurs my list-pending-block-requests script does not show
+AD4 +AD4 any pending requests:
+AD4 +AD4 
+AD4 +AD4 +ACM list-pending-block-requests        
+AD4 +AD4 dm-0
+AD4 +AD4 loop0
+AD4 +AD4 loop1
+AD4 +AD4 loop2
+AD4 +AD4 loop3
+AD4 +AD4 loop4
+AD4 +AD4 loop5
+AD4 +AD4 loop6
+AD4 +AD4 loop7
+AD4 +AD4 nullb0
+AD4 +AD4 nullb1
+AD4 +AD4 sda
+AD4 +AD4 sdb
+AD4 +AD4 sdc
+AD4 +AD4 sdd
+AD4 +AD4 vda
+AD4 +AD4 vdb
+AD4 +AD4 
+AD4 +AD4 Enabling fail+AF8-if+AF8-no+AF8-path mode did not resolve the hang so I don't think
+AD4 +AD4 that the root cause is in any of the dm drivers used in this test:
+AD4 +AD4 
+AD4 +AD4 +ACM dmsetup ls +AHw while read dm rest+ADs do dmsetup message +ACQ-dm 0 fail+AF8-if+AF8-no+AF8-path+ADs done+ADs dmsetup remove+AF8-all+ADs dmsetup table
+AD4 +AD4 360014056e756c6c62300000000000000: 0 65536 multipath 0 1 alua 1 1 service-time 0 1 2 8:16 1 1
+AD4 +AD4 
+AD4 +AD4 The same test passes against kernel v4.20-rc6.
+AD4 
+AD4 What device is this being run on?

Older versions of the srp and nvmeof-mp tests used the brd block device.
Today these tests use null+AF8-blk with memory+AF8-backed set to 1. See also
configure+AF8-null+AF8-blk() in common/multipath-over-rdma. null+AF8-blk is accessed
by ib+AF8-srpt. The dm-mpath driver is stacked on top of the ib+AF8-srp instance
that communicates with ib+AF8-srpt driver. The ib+AF8-srp and ib+AF8-srpt drivers
communicate with each other over the loopback functionality of the
rdma+AF8-rxe driver.

Bart.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux