On 1/25/23 07:43, Bart Van Assche wrote: > On 1/24/23 13:29, Damien Le Moal wrote: >> I/O priority at the device level does not exist with SAS and with SATA, >> the ACS specifications mandates that NCQ I/O priority and CDL cannot be >> used mixed together. So from the device point of view, I/O priority and >> CDL are mutually exclusive. No issues. >> >> Now, if you are talking about the host level I/O priority scheduling done >> by mq-deadline and bfq, the CDL priority class maps to the RT class. They >> are the same, as they should. There is nothing more real-time than CDL in >> my opinion :) >> >> Furthermore, if we do not reuse the I/O priority interface, we will have >> to add another field to BIOs & requests to propagate the cdl index from >> user space down to the device LLD, almost exactly in the manner of I/O >> priorities, including all the controls with merging etc. That would be a >> lot of overhead to achieve the possibility of prioritized CDL commands. >> >> CDL in of itself allows the user to define "prioritized" commands by >> defining CDLs on the drive that are sorted in increasing time limit order, >> i.e. with low CDL index numbers having low limits, and higher priority >> within the class (as CDL index == prio level). With that, schedulers can >> still do the right thing as they do now, with the additional benefit that >> they can even be improved to base their scheduling decisions on a known >> time limit for the command execution. But such optimization is not >> implemented by this series. > > Hi Damien, > > What if a device that supports I/O priorities (e.g. an NVMe device that > supports configuring the SQ priority) and a device that supports command > duration limits (e.g. a SATA hard disk) are combined via the device > mapper into a single block device? Should I/O be submitted to the dm > device with one of the existing I/O priority classes (not supported by > SATA hard disks) or with I/O priority class IOPRIO_CLASS_DL (not > supported by NVMe devices)? That is not a use case we considered. My gut feeling is that this is something the target driver should handle when processing a user IO. Note that I was not aware that Linux NVMe driver supported queue priorities... > Shouldn't the ATA core translate the existing I/O priority levels into a > command duration limit instead of introducing a new I/O priority class > that is only supported by ATA devices? There is only one priority class that ATA understands: RT (the level is irrelevant and ignored). All RT class IOs are mapped to high priority NCQ commands. All other classes map to normal priority (no priority bit set) commands. And sure, we could map the level of RT class IOs to a CDL index, as we do for the CDL class, but what would be the point ? The user should use the CDL class in that case. Furthermore, there is one additional thing that we do not yet support but will later: CDL descriptor 0 can be used to set a target time limit for high priority NCQ commands. Without this new feature introduced with CDL, the drive is free to schedule high priority NCQ commands as it wants, and that is hard coded in FW. So you can endup with very aggressive scheduling leading to significant overall IOPS drop and long tail latency for low priority commands. See page 11 and 20 of this presentation for an example: https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf For a drive that supports both CDL and NCQ priority, with CDL feature turned off, CDL descriptor 0 defines the time limit hint for high priority NCQ commands. Again, CDL and NCQ high priority are mutually exclusive. So for clarity, I really would prefer separating CDL and RT classes as we did. We could integrate CDL support reusing the RT class + level for CDL index, but I think this may be very confusing for users, especially considering that the CLDs on a drive can be defined in any order the user wants, resulting in indexes/levels that does do not have any particular order, making it impossible for the host to correctly schedule commands. -- Damien Le Moal Western Digital Research