RE: Sighting: io_uring requests on unexpected core

"Wunderlich, Mark" <mark.wunderlich@xxxxxxxxx> · Sat, 4 Apr 2020 00:03:15 +0000



>> Here is the issue:
>> While performing an FIO test, for a single job thread pinned to a specific CPU, I can trap requests to the nvmf layer from a core and queue not aligned to the FIO specified CPU.
>> I can replicate this on the baseline of nvme-5.5-rc or nvme-5.7-rc1 branches of the infradead repository, with no other patches applied.
>> For a typical 30 second 4k 100% read test there will be over 2 million packets processed, with under 100 sent by this other CPU to a different queue.  When this occurs it causes a drop in performance of 1-3%.
>> My nvmf queue configuration is 1 nr_io_queue and 104 nr_poll_queues that equal the number of active cores in the system.

>Given that you pin your fio thread the high poll queue count shouldn't really matter I assume.

>> As indicated this is while running an FIO test using io_uring for 100% random read.  And have seen this with a queue depth of 1 batch 1, as well as queue depth 32 batch 8.
>> 
>>   The basic command line being:
>> /fio --filename=/dev/nvme0n1 --time_based --runtime=30 --ramp_time=10 
>> --thread --rw=randrw --rwmixread=100 --refill_buffers --direct=1 
>> --ioengine=io_uring --hipri --fixedbufs --bs=4k --iodepth=32 
>> --iodepth_batch_complete_min=1 --iodepth_batch_complete_max=32 
>> --iodepth_batch=8 --numjobs=1 --group_reporting --gtod_reduce=0 
>> --disable_lat=0 --name=cpu3 --cpus_allowed=3
>> 
>> Adding monitoring within the code functions nvme_tcp_queue_request() and nvme_tcp_poll() I will see the following.  Polling from the expected CPU for different queues with different assigned CPU [queue->io_cpu].  And new queue request coming in on an unexpected CPU [not as directed on FIO invocation] indicating a queue context assigned with the same CPU value.  Note: even when requests come in on different CPU cores, all polling is from the same expected CPU core.

> nvme_tcp_poll: [Queue CPU 3], [CPU 3] means that the poll is called on cpu core [3] on a queue that is mapped to cpu core [3] correct?
Correct.

> nvme_tcp_poll: [Queue CPU 75], [CPU 3] means that the poll is called on cpu core [3] on a queue that is mapped to cpu core [75] correct?
Correct.

>> [  524.867622] nvme_tcp:        nvme_tcp_poll: [Queue CPU 3], [CPU 3]
>> [  524.867686] nvme_tcp:        nvme_tcp_poll: [Queue CPU 75], [CPU 3]

> I'm assuming that this is a poll invocation of a prior submission to queue that is mapped to CPU 75?
Correct.  I just grabbed a section of debug output that showed all details.

>> [  524.867693] nvme_tcp:        nvme_tcp_poll: [Queue CPU 3], [CPU 3]
>> [  524.867755] nvme_tcp: nvme_tcp_queue_request: IO-Q [Queue CPU 75], 
>> [CPU 75]

> This log print means that on cpu core [3] we see a request submitted on a queue that is mapped to cpu core [75] correct?
Actually, this print indicates that CPU 75 has called queue_request for a queue that is mapped to CPU 75.  So this highlights the issue that another CPU, other than one selected on FIO command line, has called into nvmf to issue I/O request.

>> [  524.867758] nvme_tcp:        nvme_tcp_poll: [Queue CPU 75], [CPU 3]
>> [  524.867777] nvme_tcp: nvme_tcp_queue_request: IO-Q [Queue CPU 3], [CPU 3]
>> [  524.867781] nvme_tcp:        nvme_tcp_poll: [Queue CPU 3], [CPU 3]
>> [  524.867853] nvme_tcp:        nvme_tcp_poll: [Queue CPU 75], [CPU 3]
>> [  524.867864] nvme_tcp:        nvme_tcp_poll: [Queue CPU 3], [CPU 3]
>> 
>> So, if someone can help solve this puzzle and help me understand what is causing this behavior that would be great.  Hard for me to think this is an expected, or beneficial behavior, to have a need to use some other core/queue for less than 100 requests out of over 2 million.

> I'm assuming that this phenomenon happens also without polling?
Yes, if I turn off polling by not selecting 'hipri' on the command line I will still see the queue_request behavior, but obviously not the calls to nvme_tcp_poll.

> Anyways, it is unexpected to me, given that you have a queue that is mapped to the cpu you are pinning on, I'd expect that all request that are generated on this cpu would be submitted on that same queue..

> Anyone has any insights on this?