Re: confusion about nr of pending I/O requests

Paolo Valente <paolo.valente@xxxxxxxxxx> · Wed, 19 Dec 2018 12:45:38 +0100

> Il giorno 19 dic 2018, alle ore 11:32, Ming Lei <tom.leiming@xxxxxxxxx> ha scritto:
> 
> On Wed, Dec 19, 2018 at 2:18 PM Paolo Valente <paolo.valente@xxxxxxxxxx> wrote:
>> 
>> 
>> 
>>> Il giorno 19 dic 2018, alle ore 04:45, Ming Lei <tom.leiming@xxxxxxxxx> ha scritto:
>>> 
>>> On Wed, Dec 19, 2018 at 2:52 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>> 
>>>> On 12/18/18 5:45 AM, Paolo Valente wrote:
>>>>> Hi Jens,
>>>>> sorry for the following silly question, but maybe you can solve very
>>>>> quickly a doubt for which I'd spend much more time investigating.
>>>>> 
>>>>> While doing some tests with scsi_debug, I've just seen that (at least)
>>>>> with direct I/O, the maximum number of pending I/O requests (at least
>>>>> in the I/O schedulers) is equal, unexpectedly, to the queue depth of
>>>>> the drive and not to
>>>>> /sys/block/<dev>/queue/nr_requests
>>>>> 
>>>>> For example, after:
>>>>> 
>>>>> sudo modprobe scsi_debug max_queue=4
>>>>> 
>>>>> and with fio executed as follows:
>>>>> 
>>>>> job: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=20
>>>>> 
>>>>> I get this periodic trace, where four insertions are followed by four
>>>>> completions, and so on, till the end of the I/O.  This trace is taken
>>>>> with none, but the result is the same with bfq.
>>>>> 
>>>>>            fio-20275 [001] d...  7560.655213:   8,48   I   R 281088 + 8 [fio]
>>>>>            fio-20275 [001] d...  7560.655288:   8,48   I   R 281096 + 8 [fio]
>>>>>            fio-20275 [001] d...  7560.655311:   8,48   I   R 281104 + 8 [fio]
>>>>>            fio-20275 [001] d...  7560.655331:   8,48   I   R 281112 + 8 [fio]
>>>>>         <idle>-0     [001] d.h.  7560.749868:   8,48   C   R 281088 + 8 [0]
>>>>>         <idle>-0     [001] dNh.  7560.749912:   8,48   C   R 281096 + 8 [0]
>>>>>         <idle>-0     [001] dNh.  7560.749928:   8,48   C   R 281104 + 8 [0]
>>>>>         <idle>-0     [001] dNh.  7560.749934:   8,48   C   R 281112 + 8 [0]
>>>>>            fio-20275 [001] d...  7560.750023:   8,48   I   R 281120 + 8 [fio]
>>>>>            fio-20275 [001] d...  7560.750196:   8,48   I   R 281128 + 8 [fio]
>>>>>            fio-20275 [001] d...  7560.750229:   8,48   I   R 281136 + 8 [fio]
>>>>>            fio-20275 [001] d...  7560.750250:   8,48   I   R 281144 + 8 [fio]
>>>>>         <idle>-0     [001] d.h.  7560.842510:   8,48   C   R 281120 + 8 [0]
>>>>>         <idle>-0     [001] dNh.  7560.842551:   8,48   C   R 281128 + 8 [0]
>>>>>         <idle>-0     [001] dNh.  7560.842556:   8,48   C   R 281136 + 8 [0]
>>>>>         <idle>-0     [001] dNh.  7560.842562:   8,48   C   R 281144 + 8 [0]
>>>>> 
>>>>> Shouldn't the total number of pending requests reach
>>>>> /sys/block/<dev>/queue/nr_requests ?
>>>>> 
>>>>> The latter is of course equal to 8.
>>>> 
>>>> With a scheduler, the depth is what the scheduler provides. You cannot
>>>> exceed the hardware queue depth in any situation. You just have 8
>>>> requests available for scheduling, with a max of 4 being inflight on
>>>> the device side.
>>>> 
>>>> If both were 4, for instance, then you would have nothing to schedule
>>>> with, as all of them could reside on the hardware side. That's why
>>>> the scheduler defaults to twice the hardware queue depth.
>>> 
>>> The default twice to hw queue depth might not be reasonable for multi LUN.
>>> 
>>> Maybe it should be set twice of sdev->queue_depth for SCSI or
>>> hw queue depth/hctx->nr_active.  But either way may become complicated
>>> because both can be adjusted runtime.
>>> 
>> 
>> Could you please explain why it is not working (if it is not working)
>> in my example, where there should be only one LUN?
> 
> I didn't say it isn't working, and I mean it isn't perfect.
> 
> The hardware queue depth is host-wide, that means it is shared by all LUNs.
> Of course, lots of LUNs may be attached to one single HBA. You can setup
> this setting via 'modprobe scsi_debug max_luns=16 max_queue=4' easily,
> then all 4 LUNs share the 4 tags.
> 

Ok, so you are talking about the opposite problem, in a sense.  What
I'm saying here is that tags had to be 8, as Jens pointed out, but
they are 4.

Thanks,
Paolo

> Thanks,
> Ming Lei