Re: v4.11-rc blk-mq lockup?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2017-03-28 at 08:06 -0600, Jens Axboe wrote:
> On Mon, Mar 27 2017, Bart Van Assche wrote:
> > Hello Jens,
> > 
> > If I leave the srp-test software running for a few minutes using the
> > following command:
> > 
> > # while ~bart/software/infiniband/srp-test/run_tests -d -r 30; do :; done
> > 
> > then after some time the following complaint appears for multiple
> > kworkers:
> > 
> > INFO: task kworker/9:0:65 blocked for more than 480 seconds.
> >       Tainted: G          I     4.11.0-rc4-dbg+ #5
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > kworker/9:0     D    0    65      2 0x00000000
> > Workqueue: dio/dm-0 dio_aio_complete_work
> > Call Trace:
> >  __schedule+0x3df/0xc10
> >  schedule+0x38/0x90
> >  rwsem_down_write_failed+0x2c4/0x4c0
> >  call_rwsem_down_write_failed+0x17/0x30
> >  down_write+0x5a/0x70
> >  __generic_file_fsync+0x43/0x90
> >  ext4_sync_file+0x2d0/0x550
> >  vfs_fsync_range+0x46/0xa0
> >  dio_complete+0x181/0x1b0
> >  dio_aio_complete_work+0x17/0x20
> >  process_one_work+0x208/0x6a0
> >  worker_thread+0x49/0x4a0
> >  kthread+0x107/0x140
> >  ret_from_fork+0x2e/0x40
> > 
> > I had not yet observed this behavior with kernel v4.10 or older. If this
> > happens and I check the queue state with the following script:
> 
> Can you include the 'state' file in your script?
> 
> Do you know when this started happening? You say it doesn't happen in
> 4.10, but did it pass earlier in the 4.11-rc cycle?
> 
> Does it reproduce with dm?
> 
> I can't tell from your report if this is new in the 4.11 series,
> 
> > The kernel tree I used in my tests is the result of merging the
> > following commits:
> > * commit 3dca2c2f3d3b from git://git.kernel.dk/linux-block.git
> >   ("Merge branch 'for-4.12/block' into for-next")
> > * commit f88ab0c4b481 from git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> >   ("scsi: libsas: fix ata xfer length")
> > * commit ad0376eb1483 from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >   ("Merge tag 'edac_for_4.11_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp")
> 
> Can we try and isolate it a bit - -rc4 alone, for instance?

Hello Jens,

Sorry but performing a bisect would be hard: without recent SCSI and block
layer fixes this test triggers other failures before the lockup reported in
this e-mail is triggered. See e.g.
https://marc.info/?l=linux-scsi&m=148979716822799.

I do not know whether it would be possible to modify the test such that only
the dm driver is involved but no SCSI code.

When I reran the test this morning the hang was triggered by the 02-sq-on-mq
test. This means that dm was used in blk-sq mode and that blk-mq was used for
the ib_srp SCSI device instances.

Please find below the updated script and its output.

---

#!/bin/bash

show_state() {
    local a dev=$1

    for a in device/state queue/scheduler; do
	[ -e "$dev/$a" ] && grep -aH '' "$dev/$a"
    done
}

cd /sys/class/block || exit $?
for dev in *; do
    if [ -e "$dev/mq" ]; then
	echo "$dev"
	pending=0
	for f in "$dev"/mq/*/{pending,*/rq_list}; do
	    [ -e "$f" ] || continue
	    if { read -r line1 && read -r line2; } <"$f"; then
		echo "$f"
		echo "$line1 $line2" >/dev/null
		head -n 9 "$f"
		((pending++))
	    fi
	done
	(
	    busy=0
	    cd /sys/kernel/debug/block >&/dev/null &&
	    for d in "$dev"/mq/*; do
		[ ! -d "$d" ] && continue
		grep -q '^busy=0$' "$d/tags" && continue
		((busy++))
	        for f in "$d"/{dispatch,tags*,cpu*/rq_list}; do
		    [ -e "$f" ] && grep -aH '' "$f"
		done
	    done
	    exit $busy
	)
	pending=$((pending+$?))
	[ "$pending" -gt 0 ] && show_state "$dev"
    fi
done

---

sda
sdb
sdc
sdd
sde
sde/mq/2/dispatch:ffff8803f5b65d00 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=37, .internal_tag=-1}
sde/mq/2/tags:nr_tags=62
sde/mq/2/tags:nr_reserved_tags=0
sde/mq/2/tags:active_queues=0
sde/mq/2/tags:
sde/mq/2/tags:bitmap_tags:
sde/mq/2/tags:depth=62
sde/mq/2/tags:busy=31
sde/mq/2/tags:bits_per_word=8
sde/mq/2/tags:map_nr=8
sde/mq/2/tags:alloc_hint={54, 43, 44, 43, 22, 42, 52, 4, 10, 7, 16, 32, 11, 17, 44, 26, 51, 59, 9, 45, 9, 55, 10, 44, 22, 46, 25, 25, 21, 18, 52, 32}
sde/mq/2/tags:wake_batch=7
sde/mq/2/tags:wake_index=0
sde/mq/2/tags:ws={
sde/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sde/mq/2/tags:}
sde/mq/2/tags:round_robin=0
sde/mq/2/tags_bitmap:00000000: 7f00 0000 e0ff ff1f
sde/mq/2/cpu9/rq_list:ffff8803f5b67440 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=38, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f5b68b80 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=39, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f5b6a2c0 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=40, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f5b6ba00 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=41, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f5b6d140 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=42, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f5b6e880 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=43, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373ac0000 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=44, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373ac1740 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=45, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373ac2e80 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=46, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373ac45c0 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=47, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373ac5d00 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=48, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373ac7440 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=49, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373ac8b80 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=50, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373aca2c0 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=51, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373acba00 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=52, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373acd140 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=53, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff880373ace880 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=54, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f4950000 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=55, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f4951740 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=56, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f4952e80 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=57, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f49545c0 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=58, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f4955d00 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=59, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff8803f4957440 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=60, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff88036bfe0000 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=0, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff88036bfe1740 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=1, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff88036bfe2e80 {.cmd_flags=0xca01, .rq_flags=0x2040, .tag=2, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff88036bfe45c0 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=3, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff88036bfe5d00 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=4, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff88036bfe7440 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=5, .internal_tag=-1}
sde/mq/2/cpu9/rq_list:ffff88036bfe8b80 {.cmd_flags=0x7a01, .rq_flags=0x2040, .tag=6, .internal_tag=-1}
sde/device/state:running
sde/queue/scheduler:[none] 
sdf
sdf/mq/2/tags:nr_tags=62
sdf/mq/2/tags:nr_reserved_tags=0
sdf/mq/2/tags:active_queues=0
sdf/mq/2/tags:
sdf/mq/2/tags:bitmap_tags:
sdf/mq/2/tags:depth=62
sdf/mq/2/tags:busy=31
sdf/mq/2/tags:bits_per_word=8
sdf/mq/2/tags:map_nr=8
sdf/mq/2/tags:alloc_hint={54, 43, 44, 43, 22, 42, 52, 4, 10, 7, 16, 32, 11, 17, 44, 26, 51, 59, 9, 45, 9, 55, 10, 44, 22, 46, 25, 25, 21, 18, 52, 32}
sdf/mq/2/tags:wake_batch=7
sdf/mq/2/tags:wake_index=0
sdf/mq/2/tags:ws={
sdf/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sdf/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sdf/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sdf/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sdf/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sdf/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sdf/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sdf/mq/2/tags:	{.wait_cnt=7, .wait=inactive},
sdf/mq/2/tags:}
sdf/mq/2/tags:round_robin=0
sdf/mq/2/tags_bitmap:00000000: 7f00 0000 e0ff ff1f
sdf/device/state:running
sdf/queue/scheduler:[none] 
sdg
sdh
sdi
sdj
sr0




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux