Re: [PATCH v3 01/18] block: introduce duration-limits priority class

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/26/23 17:40, Damien Le Moal wrote:
On 1/27/23 09:18, Damien Le Moal wrote:
On 1/27/23 02:33, Bart Van Assche wrote:
How about only supporting a subset of the standard such that it becomes
easy to map CDLs to host side priority levels?

I am opposed to this, for several reasons:

1) We are seeing different use cases from users that cover a wide range of
use of CDL descriptors with various definitions.

2) Passthrough commands can be used by a user to change a drive CDL
descriptors without the kernel knowing about it, unless we spend our time
revalidating the CDL descriptor log page(s)...
3) CDL standard as is is actually very sensible and not overloaded with
stuff that is only useful in niche use cases. For each CDL descriptor, you
have:
  * The active time limit, which is a clean way to specify how much time
you allow a drive to deal with bad sectors (mostly read case). A typical
HDD will try very hard to recover data from a sector, always. As a result,
the HDD may spend up to several seconds reading a sector again and again
applying different signal processing techniques until it gets the sector
ECC checked to return valid data. That of course can hugely increase an IO
latency seen by the host. In applications such as erasure coded
distributed object stores, maximum latency for an object access can thus
be kept low using this limit without compromising the data since the
object can always be rebuilt from the erasure codes if one HDD is slow to
respond. This limit is also interesting for video streaming/playback to
avoid video buffer underflow (at the expense of may be some block noise
depending on the codec).
  * The inactive time limit can be used to tell the drive how long it is
allowed to let a command stand in the drive internal queue before
processing. This is thus a parameter that allows a host to tune the drive
RPO optimization (rotational positioning optimization, e.g. HDD internal
command scheduling based on angular sector position on tracks withe the
head current position). This is a neat way to control max IOPS vs tail
latency since drives tend to privilege maximizing IOPS over lowering max
tail latency.
  * The duration guideline limit defines an overall time limit for a
command without distinguishing between active and inactive time. It is the
easiest to use (the easiest one to understand from a beginner user point
of view). This is a neat way to define an intelligent IO prioritization in
fact, way better than RT class scheduling on the host or the use of ATA
NCQ high priority, as it provides more information to the drive about the
urgency of a particular command. That allows the drive to still perform
RPO to maximize IOPS without long tail latencies. Chaining such limit with
an active+inactive time limit descriptor using the "next limit" policy
(0x1 policy) can also finely define what the drive should if the guideline
limit is exceeded (as the next descriptor can define what to do based on
the reason for the limit being exceeded: long internal queueing vs bad
sector long access time).

Note that all 3 limits can be used in a single CDL descriptor to precisely
define how a command should be processed by the device. That is why it is
nearly impossible to come up with a meaningful ordering of CDL descriptors
as an increasing set of priority levels.

A summary of my concerns is as follows:
* The current I/O priority levels (RT, BE, IDLE) apply to all block devices. IOPRIO_CLASS_DL is only supported by certain block devices (some but not all SCSI harddisks). This forces applications to check the capabilities of the storage device before it can be decided whether or not IOPRIO_CLASS_DL can be used. This is not something applications should do but something the kernel should do. Additionally, if multiple dm devices are stacked on top of the block device driver, like in Android, it becomes even more cumbersome to check whether or not the block device supports CDL. * For the RT, BE and IDLE classes, it is well defined which priority number represents a high priority and which priority number represents a low priority. For CDL, only the drive knows the priority details. I think that application software should be able to select a DL priority without having to read the CDL configuration first.

I hope that I have it made it clear that I think that the proposed user space API will be very painful to use for application developers.

Bart.




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux