On Wed, Jan 25, 2023 at 10:37:52AM -0800, Bart Van Assche wrote: (snip) > Hi Damien, > > The more I think about this, the more I'm convinced that it would be wrong > to introduce IOPRIO_CLASS_DL. Datacenters will have a mix of drives that > support CDL and drives that do not support CDL. It seems wrong to me to > make user space software responsible for figuring out whether or not the > drive supports CDL before it can be decided which I/O priority class should > be used. This is something the kernel should do instead of user space > software. Well, if we take e.g. NCQ priority as an example, as that is probably the only device side I/O priority feature currently supported by the kernel. If you want to use of NCQ priority, you need to first enable /sys/block/sdX/device/ncq_prio_enable and then submit I/O using IOPRIO_CLASS_RT, so I would argue the user already needs to know that a device supports device side I/O priority, if he wants to make use of it. For CDL there are 7 different limits for reads and 7 different limits for writes, these limits can be configured by the user. So the users that want to get most performance out of their drive will most likely analyze their workloads, and set the limits depending on how their workload actually looks like. Bottom line is that heavy users of CDL will absolutely know how the CDL limits are configured in user space, as they will pick the correct CDL index (prio level) for the descriptor that they want to use for the specific I/O that they are doing. An ioscheduler will most likely be disabled. (For CDL, the limit is from the time the command is submitted to the device, so from the device's PoV, it does not really matter if a command is queued for a long time in a scheduler or not, but from an application PoV, it does not make sense to hold back a command for long if it e.g. has a short limit.) If we were to reuse IOPRIO_CLASS_RT, then I guess the best option would be to have something like: $ cat /sys/block/sdX/device/rt_prio_backend [none] ncq-prio cdl Devices that does not support ncq-prio or cdl, e.g. currently NVMe, would just have none (i.e. RT simply means higher host side priority (if a scheduler is used)). SCSI would then have none and cdl (for SCSI devices supporting CDL.) ATA would have none, ncq-prio and cdl. (for ATA devices supporting CDL.) That would theoretically avoid another ioprio class, but like I've just explained, a user space application making use of CDL would for sure know how the descriptors look like anyway, so I'm not sure if there is an actual benefit of doing it this way over simply having a IOPRIO_CLASS_DL. I guess the only benefit would be that we would avoid introducing another I/O priority class (at the expense of additional complexity elsewhere). Kind regards, Niklas