Re: [PATCH v3 01/18] block: introduce duration-limits priority class

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/28/23 02:23, Bart Van Assche wrote:
> A summary of my concerns is as follows:
> * The current I/O priority levels (RT, BE, IDLE) apply to all block 
> devices. IOPRIO_CLASS_DL is only supported by certain block devices 
> (some but not all SCSI harddisks). This forces applications to check the 
> capabilities of the storage device before it can be decided whether or 
> not IOPRIO_CLASS_DL can be used. This is not something applications 
> should do but something the kernel should do. Additionally, if multiple 
> dm devices are stacked on top of the block device driver, like in 
> Android, it becomes even more cumbersome to check whether or not the 
> block device supports CDL.

Yes, RT, BE and IDLE apply to all block devices. And so does CDL in the sense
that if a user specifies the CDL class for IOs to a device that does not support
CDL, then nothing special will happen. There will be no differentiation of the
IOs. That *exactly* what happens when using RT, BE or IDLE with the none
scheduler (e.g. default nvme setup). And the same remark applies to RT class
mapping to ATA NCQ priority feature: the user needs to check the device to know
if that will happen, *and* also needs to turn on that feature for that mapping
to be effective.

The levels of the CDL priority class are also very well defined: they map to the
CDL descriptors defined on the drive, which are consultable by the user through
sysfs (no special tools needed), so easily discoverable.

As for DM devices, these have no scheduler. So any processing of a priority
class by a DM target driver is that driver responsibility. Initially, all that
happens is the block layer passing on that information through the stack with
the BIOs. That's it. Real action may happen once the physical block device is
reached with the IO scheduler for that device, if one is set.

At that level, none scheduler is of no concern, nothing will happen. Kyber also
ignores priorities. We are left with only bfq and mq-deadline. The latter only
cares about the priority class, ignoring levels. bfq does act on both class and
level.

IOPRIO_CLASS_DL is equal to 4, so strictly speaking, is of lower priority than
the IDLE class if you want to consider it as part of that ordering. But we
defined it as a different class to allow *not* having to do that. IO schedulers
can be modified to ignore that priority class for now, mapping it to say the
default BE class for instance. Our current patch set maps the CDL class to the
RT class for the schedulers, as that made most sense given the time-sensitive
nature of CDL workloads. But we can change that to actually let the scheduler
decide if you want. There are no other changes in the block layer that have or
need special handling of the CDL class. All very clean in my opinion, no special
conditions for that feature. No additional "if" in the hot path, no overhead added.

> * For the RT, BE and IDLE classes, it is well defined which priority 
> number represents a high priority and which priority number represents a 
> low priority. For CDL, only the drive knows the priority details. I 
> think that application software should be able to select a DL priority 
> without having to read the CDL configuration first.

The levels of the CDL priority class are also very well defined: they map to the
CDL descriptors defined on the drive, which are consultable by the user through
sysfs (no special tools needed), so easily discoverable. And unless we restrict
how CDL descriptors can be defined, which I explained in my previous email is
not desirable at all, we cannot and should not try to order levels in some sort
of priority semantic. CDL semantic does not define directly a priority level,
only time limits, which may or may not be ordered, depending on the limits
definitions.

As Niklas pointed out, this is not a "generic" feature that any random
application can magically use without modifications. The application must be
aware of what CDL is and if how the descriptors are. And for 99.99% of the use
cases, the CDL descriptors will be defined in a way usefull for that
application. There is no magic generic set of descriptors defined by default.
Though a simple set of increasing time limits that can be cleanly mapped to
priority levels. A system administrator is free to do that for the system drives
if that is what the running applications expect. CDL is a very flexible feature
that can cover a lot of use cases. Trying to shoehorn in into the legacy/classic
priority semantic framework would only restrict its usefulness.

> I hope that I have it made it clear that I think that the proposed user 
> space API will be very painful to use for application developers.

I completely disagree. Reusing the prio class/level API made it easy to allow
applications to use the feature. fio support for CDL requires exactly *one line*
change, to allow for the CDL class number 4. That's it. From there, one can use
the --cmdprio_class=4 nd --cmdprio=idx options to exercise a drive. The value of
"idx" here of course depends on how the descriptors are set on the drive. But
back to the point above. This depends on the application goals and the
descriptors are set accordingly for that goal. There is no real discovery needed
by the application. The application expect a certain set of CDL limits for its
use case, and checking that this set is the one currently defined on the drive
is easy to do from an application with the sysfs interface we added.

Many users out there have deployed and using applications taking advantage of
ATA NCQ priority feature, using class RT for high priority IOs. The new CDL
class does not require many application changes to be enabled for next gen
drives that will have CDL.

> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux