Re: [PATCH block-5.14] Revert "block/mq-deadline: Add cgroup support"

Jens Axboe <axboe@xxxxxxxxx> · Thu, 12 Aug 2021 12:56:47 -0600

On 8/12/21 11:51 AM, Tejun Heo wrote:
> On Wed, Aug 11, 2021 at 01:22:20PM -0700, Bart Van Assche wrote:
>> On 8/11/21 12:14 PM, Tejun Heo wrote:
>>> On Wed, Aug 11, 2021 at 11:49:10AM -0700, Bart Van Assche wrote:
>>>> You write that this isn't the right way to collect per cgroup stats. What is
>>>> the "right way"? Has this been documented somewhere?
>>>
>>> Well, there's nothing specific to mq-deadline or any other elevator or
>>> controller about the stats that your patch collected and showed. That
>>> seems like a pretty straight forward sign that it likely doens't
>>> belong there.
>>
>> Do you perhaps want these statistics to be reported via read-only cgroup
>> attributes of a new cgroup policy that is independent of any particular I/O
>> scheduler?
> 
> There's an almost fundamental conflict between ioprio and cgroup IO
> control. bfq layers it so that ioprio classes define the global
> priority above weights and then ioprio modifies the weights within
> each class. mq-deadline isn't cgroup aware and who knows what kind of
> priority inversions it's creating when its ioprio enforcement is
> interacting with other cgroup controllers.
> 
> The problem is that as currently used, they're specifying the same
> things - how IO should be distributed globally in the system, and
> there's no right way to make the two configuration configuration
> regimes agree on what should happen on the system.
> 
> I can see two paths forward:
> 
> 1. Accept that ioprio isn't something which makes senes with cgroup IO
>    control in a generic manner and approach it in per-configuration
>    manner, either by doing whatever the specific combination decided
>    to do with ioprio or ignoring it.
> 
> 2. The only generic way to integrate ioprio and cgroup IO control
>    would be nesting ioprio inside cgroup IO control, so that ioprio
>    can express per-process priority within each cgroup. While this
>    makes semantic sense and can be useful in certain scenarios, this
>    is also a departure from how people have been using ioprio and it'd
>    be involve quite a bit of effort and complexity, likely too much to
>    be justified by its inherent usefulness.
> 
> Jens, what do you think?

On the surface, #2 makes the most sense. But you'd then have to apply
some scaling before it reaches the hardware side or is factored in by
the underlying scheduler, or you could have a high priority from a
cgroup that has small share of the total resources, yet ends up being
regarded as more important than a lower priority request from a cgroup
that has a much higher share of the total resources.

Hence not really sure it makes a lot of sense... We could probably come
up with some heuristics that make some sense, but they'd still just be
heuristics.

-- 
Jens Axboe