Re: [PATCHv2 1/3] block: introduce rq_list_for_each_safe macro

Jens Axboe <axboe@xxxxxxxxx> · Thu, 6 Jan 2022 06:41:08 -0700

On 1/6/22 3:54 AM, Max Gurtovoy wrote:
> 
> On 1/5/2022 7:26 PM, Keith Busch wrote:
>> On Tue, Jan 04, 2022 at 02:15:58PM +0200, Max Gurtovoy wrote:
>>> This patch worked for me with 2 namespaces for NVMe PCI.
>>>
>>> I'll check it later on with my RDMA queue_rqs patches as well. There we have
>>> also a tagset sharing with the connect_q (and not only with multiple
>>> namespaces).
>>>
>>> But the connect_q is using a reserved tags only (for the connect commands).
>>>
>>> I saw some strange things that I couldn't understand:
>>>
>>> 1. running randread fio with libaio ioengine didn't call nvme_queue_rqs -
>>> expected
>>>
>>> *2. running randwrite fio with libaio ioengine did call nvme_queue_rqs - Not
>>> expected !!*
>>>
>>> *3. running randread fio with io_uring ioengine (and --iodepth_batch=32)
>>> didn't call nvme_queue_rqs - Not expected !!*
>>>
>>> 4. running randwrite fio with io_uring ioengine (and --iodepth_batch=32) did
>>> call nvme_queue_rqs - expected
>>>
>>> 5. *running randread fio with io_uring ioengine (and --iodepth_batch=32
>>> --runtime=30) didn't finish after 30 seconds and stuck for 300 seconds (fio
>>> jobs required "kill -9 fio" to remove refcounts from nvme_core)   - Not
>>> expected !!*
>>>
>>> *debug pring: fio: job 'task_nvme0n1' (state=5) hasn't exited in 300
>>> seconds, it appears to be stuck. Doing forceful exit of this job.
>>> *
>>>
>>> *6. ***running randwrite fio with io_uring ioengine (and  --iodepth_batch=32
>>> --runtime=30) didn't finish after 30 seconds and stuck for 300 seconds (fio
>>> jobs required "kill -9 fio" to remove refcounts from nvme_core)   - Not
>>> expected !!**
>>>
>>> ***debug pring: fio: job 'task_nvme0n1' (state=5) hasn't exited in 300
>>> seconds, it appears to be stuck. Doing forceful exit of this job.***
>>>
>>>
>>> any idea what could cause these unexpected scenarios ? at least unexpected
>>> for me :)
>> Not sure about all the scenarios. I believe it should call queue_rqs
>> anytime we finish a plugged list of requests as long as the requests
>> come from the same request_queue, and it's not being flushed from
>> io_schedule().
> 
> I also see we have batch > 1 only in the start of the fio operation. 
> After X IO operations batch size == 1 till the end of the fio.

There are two settings for completion batch, you're likely not setting
them? That in turn will prevent the submit side from submitting more
than 1, as that's all that's left.

>> The stuck fio job might be a lost request, which is what this series
>> should address. It would be unusual to see such an error happen in
>> normal operation, though. I had to synthesize errors to verify the bug
>> and fix.
> 
> But there is no timeout error and prints in the dmesg.
> 
> If there was a timeout prints I would expect the issue might be in the
> local NVMe device, but there isn't.
> 
> Also this phenomena doesn't happen with NVMf/RDMA code I developed
> locally.

There would only be a timeout if it wasn't lost. Keith's patches fixed a
case where it was simply dropped from the list. As it was never started,
it won't get timed out.

-- 
Jens Axboe