Re: [PATCH block-5.14] Revert "block/mq-deadline: Add cgroup support"

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Fri, 13 Aug 2021 02:18:04 +0000

On 2021/08/13 4:23, Tejun Heo wrote:
> Hello,
> 
> On Thu, Aug 12, 2021 at 11:16:37AM -0700, Bart Van Assche wrote:
>> Are you perhaps referring to the iocost and iolatency cgroup controllers? Is
> 
> and blk-throtl
> 
>> it ever useful to combine these controllers with the ioprio controller? The
> 
> So, I used controller as in cgroup IO controller, something which can
> distribute IO resources hierarchically according to either weights or
> limits. In that sense, there is no such thing as an ioprio controller.
> 
>> ioprio controller was introduced with the goal to provide the information to
>> the storage controller about which I/O request to handle first. My
> 
> My experience has been that this isn't all that useful for generic IO
> control scenarios involving cgroups. The configuration is too flexible
> and granular to map to hardware priorities and there are way more
> significant factors than how a controller manages its iternal ordering
> on most commodity SSDs. For situations where such feature is useful,
> cgroup might be able to help with tagging the associated priorities
> but I don't think there's much beyond that and I have a hard time
> seeing why the existing ioprio interface wouldn't be enough.
> 
>> understanding of the iocost and iolatency controllers is that these cgroup
>> controllers decide in which order to process I/O requests. Neither
>> controller has the last word over I/O order if the queue depth is larger
>> than one, something that is essential to achieve reasonable performance.
> 
> Well, whoever owns the queue can control the latencies and it isn't
> difficult to mess up while issuing one command at a time, so if the
> strategy is stuffing device queue as much as possible and telling the
> device what to do, the end result is gonna be pretty sad.

Let me throw in more information related to this.

Command duration limits (CDL) and Sequestered commands features are being
drafted in SPC/SBC and ACS to give the device better hints than just a on/off
high priority bit. I am currently prototyping these features and I am reusing
the ioprio interface for that. Here is how this works:
1) The drives exposes a set of command duration limits descriptors (up to 7 for
reads and 7 for writes) that define duration limits for a command execution:
overall processing time, queuing time and execution time. Each duration time has
a policy associated with it that is applied if a command processing exceeds one
of the defined time limit: continue, continue but signal limit exceeded, abort.
2) Users can change the drive command duration limits to whatever they need
(e.g. change the policies for the limits to get a fast-fail behavior for
commands instead of having the drive retry for a long time)
3) When issuing IOs, users (or FSes) can apply a command duration limit
descriptor by specifying the IOPRIO_CLASS_DL priority class. The priority level
for that class indicates the descriptor to apply to the command.
4) At SCSI/ATA level, read and write commands have 3 bits defined to specify the
command descriptor to apply to the command (1 to 7 or 0 for no limit)

With that in place, the disk firmware can now make more intelligent decisions on
command scheduling to keep performance high at high queue depth without
increasing latency for commands that have low duration limits. And based on the
policy defined for a limit, this can be a "soft" best-effort optimization by the
disk, or a hard one with aborts if the drive decides that what the user is
asking for is not possible.

CDL can completely replace the existing binary on/off NCQ priority in a more
flexible manner as the user can set different duration limits for high vs low
priority. E.g. high priority is equivalent to a very short limit while low
priority is equivalent to longer or no limits.

I think that CDL has the potential for better interactions with cgroups as
cgroup controllers can install a set of limits on the drive that fits the
controller target policy. E.g., the latency controller can set duration limits
and use the IOPRIO_CLASS_DL class to tell the drive the exact latency target to use.

In my implementation, I have not yet looked into cgroups integration for CDL
though. I am still wondering what the best approach is: defining a new
controller or integrating into existing controllers. The former is likely easier
than the latter, but having hardware support for existing controllers has the
potential to improve them seamlessly without forcing the user to change anything
to there application setup.

CDL is still in draft state in the specs though. So I will not be sending this yet.

> 
> Thanks.
> 

-- 
Damien Le Moal
Western Digital Research