v4.11-rc blk-mq lockup?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Jens,

If I leave the srp-test software running for a few minutes using the
following command:

# while ~bart/software/infiniband/srp-test/run_tests -d -r 30; do :; done

then after some time the following complaint appears for multiple
kworkers:

INFO: task kworker/9:0:65 blocked for more than 480 seconds.
      Tainted: G          I     4.11.0-rc4-dbg+ #5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/9:0     D    0    65      2 0x00000000
Workqueue: dio/dm-0 dio_aio_complete_work
Call Trace:
 __schedule+0x3df/0xc10
 schedule+0x38/0x90
 rwsem_down_write_failed+0x2c4/0x4c0
 call_rwsem_down_write_failed+0x17/0x30
 down_write+0x5a/0x70
 __generic_file_fsync+0x43/0x90
 ext4_sync_file+0x2d0/0x550
 vfs_fsync_range+0x46/0xa0
 dio_complete+0x181/0x1b0
 dio_aio_complete_work+0x17/0x20
 process_one_work+0x208/0x6a0
 worker_thread+0x49/0x4a0
 kthread+0x107/0x140
 ret_from_fork+0x2e/0x40

I had not yet observed this behavior with kernel v4.10 or older. If this
happens and I check the queue state with the following script:

#!/bin/bash

cd /sys/class/block || exit $?
for dev in *; do
    if [ -e "$dev/mq" ]; then
	echo "$dev"
	for f in "$dev"/mq/*/{pending,*/rq_list}; do
	    [ -e "$f" ] || continue
	    if { read -r line1 && read -r line2; } <"$f"; then
		echo "$f"
		echo "$line1 $line2" >/dev/null
		head -n 9 "$f"
	    fi
	done
	(
	    cd /sys/kernel/debug/block >&/dev/null &&
	    for d in "$dev"/mq/*; do
		[ ! -d "$d" ] && continue
		grep -q '^busy=0$' "$d/tags" && continue
	        for f in "$d"/{dispatch,tags*,cpu*/rq_list}; do
		    [ -e "$f" ] && grep -aH '' "$f"
		done
	    done
	)
    fi
done

then the following output appears:

sda
sdb
sdc
sdd
sdd/mq/3/dispatch:ffff880401655d00 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=59, .internal_tag=-1}
sdd/mq/3/tags:nr_tags=62
sdd/mq/3/tags:nr_reserved_tags=0
sdd/mq/3/tags:active_queues=0
sdd/mq/3/tags:
sdd/mq/3/tags:bitmap_tags:
sdd/mq/3/tags:depth=62
sdd/mq/3/tags:busy=31
sdd/mq/3/tags:bits_per_word=8
sdd/mq/3/tags:map_nr=8
sdd/mq/3/tags:alloc_hint={23, 23, 52, 1, 55, 29, 17, 22, 34, 48, 25, 49, 37, 43, 58, 25, 6, 20, 50, 14, 55, 7, 21, 17, 26, 36, 43, 43, 4, 6, 3, 47}
sdd/mq/3/tags:wake_batch=7
sdd/mq/3/tags:wake_index=0
sdd/mq/3/tags:ws={
sdd/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:}
sdd/mq/3/tags:round_robin=0
sdd/mq/3/tags_bitmap:00000000: ffff ff1f 0000 0018
sdd/mq/3/cpu5/rq_list:ffff880401657440 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=60, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba0000 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=0, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba1740 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=1, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba2e80 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=2, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba45c0 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=3, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba5d00 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=4, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba7440 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=5, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba8b80 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=6, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037abaa2c0 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=7, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037ababa00 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=8, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037abad140 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=9, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037abae880 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=10, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369900000 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=11, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369901740 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=12, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369902e80 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=13, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8803699045c0 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=14, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369905d00 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=15, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369907440 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=16, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369908b80 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=17, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88036990a2c0 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=18, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88036990ba00 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=19, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88036990d140 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=20, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88036990e880 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=21, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b0000 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=22, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b1740 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=23, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b2e80 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=24, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b45c0 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=25, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b5d00 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=26, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b7440 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=27, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b8b80 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=28, .internal_tag=-1}
sde
sde/mq/3/tags:nr_tags=62
sde/mq/3/tags:nr_reserved_tags=0
sde/mq/3/tags:active_queues=0
sde/mq/3/tags:
sde/mq/3/tags:bitmap_tags:
sde/mq/3/tags:depth=62
sde/mq/3/tags:busy=31
sde/mq/3/tags:bits_per_word=8
sde/mq/3/tags:map_nr=8
sde/mq/3/tags:alloc_hint={23, 23, 52, 1, 55, 29, 17, 22, 34, 48, 25, 49, 37, 43, 58, 25, 6, 20, 50, 14, 55, 7, 21, 17, 26, 36, 43, 43, 4, 6, 3, 47}
sde/mq/3/tags:wake_batch=7
sde/mq/3/tags:wake_index=0
sde/mq/3/tags:ws={
sde/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:}
sde/mq/3/tags:round_robin=0
sde/mq/3/tags_bitmap:00000000: ffff ff1f 0000 0018
sdf
sdg
sdh
sdi
sdj
sr0

I am using the "none" scheduler:

# cat /sys/class/block/sdd/queue/scheduler  
[none] 
# cat /sys/class/block/sde/queue/scheduler   
[none]

What is remarkable is that I see pending requests for the sd* devices
but not for any dm* device and also that the number of busy requests (31)
is exactly half of the queue depth (62). Could this indicate that the
block layer stopped processing these blk-mq queues?

If this happens and I run the following command to trigger SRP logout:

# for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete; done

then the test that was running finishes, reports that removing the
multipath device failed and echo w >/proc/sysrq-trigger produces the
following output:

sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
systemd-udevd   D    0 14490    508 0x00000106
Call Trace:
 __schedule+0x3df/0xc10
 schedule+0x38/0x90
 io_schedule+0x11/0x40
 __lock_page+0x10c/0x140
 truncate_inode_pages_range+0x45d/0x780
 truncate_inode_pages+0x10/0x20
 kill_bdev+0x30/0x40
 __blkdev_put+0x71/0x220
 blkdev_put+0x49/0x170
 blkdev_close+0x20/0x30
 __fput+0xe8/0x1f0
 ____fput+0x9/0x10
 task_work_run+0x80/0xb0
 do_exit+0x30c/0xc70
 do_group_exit+0x4b/0xc0
 get_signal+0x2c2/0x930
 do_signal+0x23/0x670
 exit_to_usermode_loop+0x5d/0xa0
 do_syscall_64+0xd5/0x140
 entry_SYSCALL64_slow_path+0x25/0x25

Does this indicate that truncate_inode_pages_range() is waiting
because a block layer queue got stuck?

The kernel tree I used in my tests is the result of merging the
following commits:
* commit 3dca2c2f3d3b from git://git.kernel.dk/linux-block.git
  ("Merge branch 'for-4.12/block' into for-next")
* commit f88ab0c4b481 from git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
  ("scsi: libsas: fix ata xfer length")
* commit ad0376eb1483 from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  ("Merge tag 'edac_for_4.11_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp")

Please let me know if you need more information.

Thanks,

Bart.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux