On Fri, Aug 24, 2018 at 06:33:29PM -0600, Jens Axboe wrote: > On 8/24/18 6:21 PM, Jens Axboe wrote: > > On 8/24/18 5:16 PM, Ming Lei wrote: > >> Hi, > >> > >> On Fri, Aug 24, 2018 at 04:20:41PM -0600, Jens Axboe wrote: > >>> Hi, > >>> > >>> Was testing other things today, but ended up with this: > >>> > >>> # echo "write through" > /sys/block/sde/device/scsi_disk/4:0:0:0/cache_type > >>> > >>> hanging. Looking closer, the request is successfully queued and the > >>> caller is waiting on rq execution and completion, but the request is > >>> sitting in the hctx->dispatch list and is continually being attempted > >>> issued, but gets a BLK_STS_RESOURCE return. > >> > >> Just run fio randwrite and 'dbench -s' on virtio-scsi/usb-storage > >> after setting 'write through', looks not see such issue. > >> > >> Also not see such kind of issue on blktests/xfstests against today's > >> next tree too. > >> > >> Could you share a bit more(disk, io sched, dmesg log, workload) about > >> how to reproduce it? Is it in normal IO path or EH? > > > > You're misunderstanding. The echo "write through" is the one that hangs, > > not subsequent IO. As written above, that first spawns a TUR and that > > request is being inserted, and the caller ends up waiting for it to > > complete off blk_execute_rq(). But the request itself sits on the > > dispatch list, gets dispatched, and gets BLK_STS_RESOURCE off > > ->queue_rq(). It goes back on the dispatch list, and the process repeats > > indefinitely since it always gets a BUSY return. On the SCSI side, what > > happens is that scsi_host_queue_ready() keeps returning false, which is > > why we keep returning BLK_STS_RESOURCE and not making any progress at > > all. > > Task doing the echo: > > [<0>] blk_execute_rq+0x77/0xa0 > [<0>] __scsi_execute+0xd3/0x1f0 > [<0>] sd_revalidate_disk+0xda/0x1cd0 [sd_mod] > [<0>] revalidate_disk+0x20/0x80 > [<0>] cache_type_store+0x1f7/0x210 [sd_mod] > [<0>] kernfs_fop_write+0x106/0x190 > [<0>] __vfs_write+0x23/0x150 > [<0>] vfs_write+0xbe/0x1b0 > [<0>] ksys_write+0x45/0xa0 > [<0>] do_syscall_64+0x42/0x100 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [<0>] 0xffffffffffffffff > > and the host is in perpetual recovery mode: > > # cat /sys/bus/scsi/devices/host4/scsi_host/host4/state > recovery > > This is a normal SATA drive, hanging off ahci, queue depth 32. As mentioned > earlier, scsi_host_queue_ready() keeps returning false. Thanks, I can reproduce it now on sata, will investigate it a bit. Thanks, Ming