OK, I think we can get it for fabrics too, need to figure out how to
handle it there too.
Do you have a reproducer?
To repro, I have to run a buffered writer workload then put the system into S3.
This fio job seems to reproduce for me:
fio --name=global --filename=/dev/nvme0n1 --bsrange=4k-128k --rw=randwrite --ioengine=libaio --iodepth=8 --numjobs=8 --name=foobar
I use rtcwake to test suspend/resume:
rtcwake -m mem -s 10
Without the patch we'll get stuck after "Disabling non-boot CPUs ..."
when blk-mq waits to freeze some entered queues after nvme was disabled.
I'm observing the same thing when hibernating during mdraid resync on
nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
CPUs ...". This patch did not help but when I put nvme_wait_freeze()
right after nvme_start_freeze() it appeared to be working.
Interesting. did the nvme device succeeded to shutdown at all?
Maybe the
difference here is that requests are submitted from a non-freezable
kernel thread (md sync_thread)?
Don't think its related...