On Thu, Jan 26, 2023 at 09:24:12AM +0900, Damien Le Moal wrote: > On 2023/01/26 6:23, Niklas Cassel wrote: > > On Wed, Jan 25, 2023 at 10:37:52AM -0800, Bart Van Assche wrote: (snip) > > If we were to reuse IOPRIO_CLASS_RT, then I guess the best option would be > > to have something like: > > > > $ cat /sys/block/sdX/device/rt_prio_backend > > [none] ncq-prio cdl > > No need for this. We can keep the existing ncq_prio_enable and the proposed > duration_limits/enable sysfs attributes. The user cannot enable both at the same > time with our patches. So if the user enables ncq_prio_enable, then it will get > high priority NCQ commands mapping for any level of the RT class. If > duration_limits/enable is set, then the user will get CDL scheduling of commands > on the drive. > > But again, the difficulty with this overloading is that we *cannot* implement a > solid level-based scheduling in IO schedulers because ordering the CDLs in a > meaningful way is impossible. So BFQ handling of the RT class would likely not > result in the most ideal scheduling (that would depend heavily on how the CDL > descriptors are defined on the drive). Hence my reluctance to overload the RT > class for CDL. Well, if CDL were to reuse IOPRIO_CLASS_RT, then the user would either have to disable the IO scheduler, so that lower classdata levels wouldn't be prioritized over higher classdata levels, or simply use an IO scheduler that does not care about the classdata level, e.g. mq-deadline. >From ionice man page: -n, --classdata level Specify the scheduling class data. This only has an effect if the class accepts an argument. For realtime and best-effort, 0-7 are valid data (priority levels), and 0 represents the highest priority level. I guess it kind of made sense for NCQ priority to piggyback on IOPRIO_CLASS_RT, since the only thing that libata has to do is to set singular the high prio bit (so the classdata could still be used for prioritizing IOs on the host side). However, for CDL, things are not as simple as setting a single bit in the command, because of all the different descriptors, so we must let the classdata represent the device side priority level, and not the host side priority level (as we cannot have both, and I agree with you, it is very hard define an order between the descriptors.. e.g. should a 20 ms policy 0xf descriptor be ranked higher or lower than a 20 ms policy 0xd descriptor?). It's best to let the definition for IOPRIO_CLASS_RT stay the way it always has, 0 represents the highest priority level, 7 the lowest priority level (and we wouldn't be able to change how the schedulers handle IOPRIO_CLASS_RT anyway). If we update the man page with a IOPRIO_CLASS_DL entry, we could clearly state that IO schedulers do not care about the classdata at all for IOPRIO_CLASS_DL (and that the classdata is solely used to convey a priority state/index to the device). So I think this patch is good as it is. Bart, do you agree? Any chance for a Reviewed-by? Kind regards, Niklas