Re: [PATCH 8/9] block/mq-deadline: Add I/O priority support

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Fri, 28 May 2021 01:40:06 +0000

On 2021/05/28 5:23, Bart Van Assche wrote:
> On 5/27/21 12:07 AM, Hannes Reinecke wrote:
>> On 5/27/21 3:01 AM, Bart Van Assche wrote:
>>> Maintain one FIFO list per I/O priority: RT, BE and IDLE. Only dispatch
>>> requests for a lower priority after all higher priority requests have
>>> finished. Maintain statistics for each priority level. Split the debugfs
>>> attributes per priority level as follows:
>>>
>>> $ ls /sys/kernel/debug/block/.../sched/
>>> async_depth  dispatch2        read_next_rq      write2_fifo_list
>>> batching     read0_fifo_list  starved           write_next_rq
>>> dispatch0    read1_fifo_list  write0_fifo_list
>>> dispatch1    read2_fifo_list  write1_fifo_list
>>
>> Interesting question, though, wrt to request merging and realtime
>> priority: what takes priority?
>> Is the realtime priority more important than request merging?
> 
> We plan to use two I/O priorities: one for foreground applications and
> one for background applications. We want to lower the application
> startup time if background I/O is ongoing. The code that associates
> different cgroups with foreground and background applications is already
> available. We care more about foreground I/O being prioritized over
> background I/O than about foreground I/O being real-time.
> 
>> And also the ioprio document states that there are 8 levels of class
>> data, determining how much time each class should spend on disk access.
>> Have these been taken into consideration?
> 
> My understanding is that the ioprio document was written before any I/O
> controllers implemented I/O priorities. I'm not sure whether I/O
> controllers will ever implement more than two I/O priorities. See also
> commit 8e061784b51e ("ata: Enabling ATA Command Priorities"). A paper
> about the benefits of I/O controllers supporting I/O priorities is
> available at
> https://www.usenix.org/system/files/conference/hotstorage17/hotstorage17-paper-manzanares.pdf.

ATA NCQ has only 2 bits, defining 3 different priority levels, one of which is
not really a priority but rather latency limitation. So yes, basically, ATA
supports only high-priority and no-priority. 2 levels only with all prio levels
of the RT class mapping to the device high-priority level and everything else
mapping to device no-priority level.

That said, Command Duration Limits has been standardized and I was planning to
introduce support for it using a new priority class IOPRIO_CLASS_DURATION with
the priority level indicating the duration limit to apply to the IOs. In this
case, differentiation between the different priority levels of that class may be
necessary at the scheduler level. I will probably revisit this when I send
command duration limit support.

> 
>>>  /*
>>>   * deadline_check_fifo returns 0 if there are no expired requests on the fifo,
>>>   * 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
>>>   */
>>>  static inline int deadline_check_fifo(struct deadline_data *dd,
>>> +				      enum dd_prio prio,
>>>  				      enum dd_data_dir data_dir)
>>>  {
>>> -	struct request *rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
>>> +	struct request *rq = rq_entry_fifo(dd->fifo_list[prio][data_dir].next);
>>>  
>>>  	/*
>>>  	 * rq is expired!
>>
>> I am _so_ not a fan of arrays in C.
>> Can't you make this an accessor and use
>> dd->fifo_list[prio * 2 + data_dir] ?
> 
> That's possible, but if the compiler can translate [prio][data_dir] into
> [prio * 2 + data_dir], should I really do this myself instead of letting
> the compiler generate that transformation?
> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research