On 1/5/2022 7:26 PM, Keith Busch wrote:
On Tue, Jan 04, 2022 at 02:15:58PM +0200, Max Gurtovoy wrote:
This patch worked for me with 2 namespaces for NVMe PCI.
I'll check it later on with my RDMA queue_rqs patches as well. There we have
also a tagset sharing with the connect_q (and not only with multiple
namespaces).
But the connect_q is using a reserved tags only (for the connect commands).
I saw some strange things that I couldn't understand:
1. running randread fio with libaio ioengine didn't call nvme_queue_rqs -
expected
*2. running randwrite fio with libaio ioengine did call nvme_queue_rqs - Not
expected !!*
*3. running randread fio with io_uring ioengine (and --iodepth_batch=32)
didn't call nvme_queue_rqs - Not expected !!*
4. running randwrite fio with io_uring ioengine (and --iodepth_batch=32) did
call nvme_queue_rqs - expected
5. *running randread fio with io_uring ioengine (and --iodepth_batch=32
--runtime=30) didn't finish after 30 seconds and stuck for 300 seconds (fio
jobs required "kill -9 fio" to remove refcounts from nvme_core) - Not
expected !!*
*debug pring: fio: job 'task_nvme0n1' (state=5) hasn't exited in 300
seconds, it appears to be stuck. Doing forceful exit of this job.
*
*6. ***running randwrite fio with io_uring ioengine (and --iodepth_batch=32
--runtime=30) didn't finish after 30 seconds and stuck for 300 seconds (fio
jobs required "kill -9 fio" to remove refcounts from nvme_core) - Not
expected !!**
***debug pring: fio: job 'task_nvme0n1' (state=5) hasn't exited in 300
seconds, it appears to be stuck. Doing forceful exit of this job.***
any idea what could cause these unexpected scenarios ? at least unexpected
for me :)
Not sure about all the scenarios. I believe it should call queue_rqs
anytime we finish a plugged list of requests as long as the requests
come from the same request_queue, and it's not being flushed from
io_schedule().
I also see we have batch > 1 only in the start of the fio operation.
After X IO operations batch size == 1 till the end of the fio.
The stuck fio job might be a lost request, which is what this series
should address. It would be unusual to see such an error happen in
normal operation, though. I had to synthesize errors to verify the bug
and fix.
But there is no timeout error and prints in the dmesg.
If there was a timeout prints I would expect the issue might be in the
local NVMe device, but there isn't.
Also this phenomena doesn't happen with NVMf/RDMA code I developed locally.
In any case, I'll run more multi-namespace tests to see if I can find
any other issues with shared tags.
I believe that the above concerns are not related to the shared-tags but
to the entire mechanism.