On 8/12/21 11:51 AM, Tejun Heo wrote: > On Wed, Aug 11, 2021 at 01:22:20PM -0700, Bart Van Assche wrote: >> On 8/11/21 12:14 PM, Tejun Heo wrote: >>> On Wed, Aug 11, 2021 at 11:49:10AM -0700, Bart Van Assche wrote: >>>> You write that this isn't the right way to collect per cgroup stats. What is >>>> the "right way"? Has this been documented somewhere? >>> >>> Well, there's nothing specific to mq-deadline or any other elevator or >>> controller about the stats that your patch collected and showed. That >>> seems like a pretty straight forward sign that it likely doens't >>> belong there. >> >> Do you perhaps want these statistics to be reported via read-only cgroup >> attributes of a new cgroup policy that is independent of any particular I/O >> scheduler? > > There's an almost fundamental conflict between ioprio and cgroup IO > control. bfq layers it so that ioprio classes define the global > priority above weights and then ioprio modifies the weights within > each class. mq-deadline isn't cgroup aware and who knows what kind of > priority inversions it's creating when its ioprio enforcement is > interacting with other cgroup controllers. > > The problem is that as currently used, they're specifying the same > things - how IO should be distributed globally in the system, and > there's no right way to make the two configuration configuration > regimes agree on what should happen on the system. > > I can see two paths forward: > > 1. Accept that ioprio isn't something which makes senes with cgroup IO > control in a generic manner and approach it in per-configuration > manner, either by doing whatever the specific combination decided > to do with ioprio or ignoring it. > > 2. The only generic way to integrate ioprio and cgroup IO control > would be nesting ioprio inside cgroup IO control, so that ioprio > can express per-process priority within each cgroup. While this > makes semantic sense and can be useful in certain scenarios, this > is also a departure from how people have been using ioprio and it'd > be involve quite a bit of effort and complexity, likely too much to > be justified by its inherent usefulness. > > Jens, what do you think? On the surface, #2 makes the most sense. But you'd then have to apply some scaling before it reaches the hardware side or is factored in by the underlying scheduler, or you could have a high priority from a cgroup that has small share of the total resources, yet ends up being regarded as more important than a lower priority request from a cgroup that has a much higher share of the total resources. Hence not really sure it makes a lot of sense... We could probably come up with some heuristics that make some sense, but they'd still just be heuristics. -- Jens Axboe