On 2021/05/28 5:23, Bart Van Assche wrote: > On 5/27/21 12:07 AM, Hannes Reinecke wrote: >> On 5/27/21 3:01 AM, Bart Van Assche wrote: >>> Maintain one FIFO list per I/O priority: RT, BE and IDLE. Only dispatch >>> requests for a lower priority after all higher priority requests have >>> finished. Maintain statistics for each priority level. Split the debugfs >>> attributes per priority level as follows: >>> >>> $ ls /sys/kernel/debug/block/.../sched/ >>> async_depth dispatch2 read_next_rq write2_fifo_list >>> batching read0_fifo_list starved write_next_rq >>> dispatch0 read1_fifo_list write0_fifo_list >>> dispatch1 read2_fifo_list write1_fifo_list >> >> Interesting question, though, wrt to request merging and realtime >> priority: what takes priority? >> Is the realtime priority more important than request merging? > > We plan to use two I/O priorities: one for foreground applications and > one for background applications. We want to lower the application > startup time if background I/O is ongoing. The code that associates > different cgroups with foreground and background applications is already > available. We care more about foreground I/O being prioritized over > background I/O than about foreground I/O being real-time. > >> And also the ioprio document states that there are 8 levels of class >> data, determining how much time each class should spend on disk access. >> Have these been taken into consideration? > > My understanding is that the ioprio document was written before any I/O > controllers implemented I/O priorities. I'm not sure whether I/O > controllers will ever implement more than two I/O priorities. See also > commit 8e061784b51e ("ata: Enabling ATA Command Priorities"). A paper > about the benefits of I/O controllers supporting I/O priorities is > available at > https://www.usenix.org/system/files/conference/hotstorage17/hotstorage17-paper-manzanares.pdf. ATA NCQ has only 2 bits, defining 3 different priority levels, one of which is not really a priority but rather latency limitation. So yes, basically, ATA supports only high-priority and no-priority. 2 levels only with all prio levels of the RT class mapping to the device high-priority level and everything else mapping to device no-priority level. That said, Command Duration Limits has been standardized and I was planning to introduce support for it using a new priority class IOPRIO_CLASS_DURATION with the priority level indicating the duration limit to apply to the IOs. In this case, differentiation between the different priority levels of that class may be necessary at the scheduler level. I will probably revisit this when I send command duration limit support. > >>> /* >>> * deadline_check_fifo returns 0 if there are no expired requests on the fifo, >>> * 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir]) >>> */ >>> static inline int deadline_check_fifo(struct deadline_data *dd, >>> + enum dd_prio prio, >>> enum dd_data_dir data_dir) >>> { >>> - struct request *rq = rq_entry_fifo(dd->fifo_list[data_dir].next); >>> + struct request *rq = rq_entry_fifo(dd->fifo_list[prio][data_dir].next); >>> >>> /* >>> * rq is expired! >> >> I am _so_ not a fan of arrays in C. >> Can't you make this an accessor and use >> dd->fifo_list[prio * 2 + data_dir] ? > > That's possible, but if the compiler can translate [prio][data_dir] into > [prio * 2 + data_dir], should I really do this myself instead of letting > the compiler generate that transformation? > > Thanks, > > Bart. > -- Damien Le Moal Western Digital Research