Qais Yousef <qyousef@xxxxxxxxxxx> writes: > The new util clamp feature needs a document explaining what it is and > how to use it. The new document hopefully covers everything one needs to > know about uclamp. > > Signed-off-by: Qais Yousef <qais.yousef@xxxxxxx> > Signed-off-by: Qais Yousef (Google) <qyousef@xxxxxxxxxxx> > --- > > Changes in v2: > > * Address various style comments from Bagas Just a handful of nits - this looks like a good addition to our documentation, thanks. > Documentation/scheduler/index.rst | 1 + > Documentation/scheduler/sched-util-clamp.rst | 732 +++++++++++++++++++ > 2 files changed, 733 insertions(+) > create mode 100644 Documentation/scheduler/sched-util-clamp.rst > > diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst > index b430d856056a..f12d0d06de3a 100644 > --- a/Documentation/scheduler/index.rst > +++ b/Documentation/scheduler/index.rst > @@ -15,6 +15,7 @@ Linux Scheduler > sched-capacity > sched-energy > schedutil > + sched-util-clamp > sched-nice-design > sched-rt-group > sched-stats > diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/scheduler/sched-util-clamp.rst > new file mode 100644 > index 000000000000..da1881e293c3 > --- /dev/null > +++ b/Documentation/scheduler/sched-util-clamp.rst > @@ -0,0 +1,732 @@ > +==================== > +Utilization Clamping > +==================== RST files, too, should have SPDX tags at the beginning. The practice rather lags the rule here, but we're trying... > +1. Introduction > +================ > + > +Utilization clamping is a scheduler feature that allows user space to help in > +managing the performance requirement of tasks. It was introduced in v5.3 > +release. The CGroup support was merged in v5.4. > + > +It is often referred to as util clamp and uclamp. You'll find all variations > +used interchangeably in this documentation and in the source code. > + > +Uclamp is a hinting mechanism that allows the scheduler to understand the > +performance requirements and restrictions of the tasks. Hence help it make ...of the tasks, and thus help it ... > +a better placement decision. And when schedutil cpufreq governor is used, util > +clamp will influence the frequency selection as well. > + > +Since scheduler and schedutil are both driven by PELT (util_avg) signals, util Since *the* scheduler > +clamp acts on that to achieve its goal by clamping the signal to a certain > +point; hence the name. I.e: by clamping utilization we are making the system > +run at a certain performance point. > + > +The right way to view util clamp is as a mechanism to make performance > +constraints request/hint. It consists of two components: > + > + * UCLAMP_MIN, which sets a lower bound. > + * UCLAMP_MAX, which sets an upper bound. > + > +These two bounds will ensure a task will operate within this performance range > +of the system. UCLAMP_MIN implies boosting a task, while UCLAMP_MAX implies > +capping a task. > + > +One can tell the system (scheduler) that some tasks require a minimum > +performance point to operate at to deliver the desired user experience. Or one > +can tell the system that some tasks should be restricted from consuming too > +much resources and should NOT go above a specific performance point. Viewing > +the uclamp values as performance points rather than utilization is a better > +abstraction from user space point of view. > + > +As an example, a game can use util clamp to form a feedback loop with its > +perceived FPS. It can dynamically increase the minimum performance point FPS? > +required by its display pipeline to ensure no frame is dropped. It can also > +dynamically 'prime' up these tasks if it knows in the coming few 100ms > +a computationally intensive scene is about to happen. > + > +On mobile hardware where the capability of the devices varies a lot, this > +dynamic feedback loop offers a great flexibility in ensuring best user > +experience given the capabilities of any system. > + > +Of course a static configuration is possible too. The exact usage will depend > +on the system, application and the desired outcome. > + > +Another example is in Android where tasks are classified as background, > +foreground, top-app, etc. Util clamp can be used to constraint how much to *constrain* how > +resources background tasks are consuming by capping the performance point they > +can run at. This constraint helps reserve resources for important tasks, like > +the ones belonging to the currently active app (top-app group). Beside this > +helps in limiting how much power they consume. This can be more obvious in > +heterogeneous systems; the constraint will help bias the background tasks to > +stay on the little cores which will ensure that: > + > + 1. The big cores are free to run top-app tasks immediately. top-app > + tasks are the tasks the user is currently interacting with, hence > + the most important tasks in the system. > + 2. They don't run on a power hungry core and drain battery even if they > + are CPU intensive tasks. > + > +By making these uclamp performance requests, or rather hints, user space can > +ensure system resources are used optimally to deliver the best user experience > +the system is capable of. > + > +Another use case is to help with overcoming the ramp up latency inherit in how > +scheduler utilization signal is calculated. > + > +A busy task for instance that requires to run at maximum performance point will > +suffer a delay of ~200ms (PELT HALFIFE = 32ms) for the scheduler to realize > +that. This is known to affect workloads like gaming on mobile devices where > +frames will drop due to slow response time to select the higher frequency > +required for the tasks to finish their work in time. > + > +The overall visible effect goes beyond better perceived user > +experience/performance and stretches to help achieve a better overall > +performance/watt if used effectively. > + > +User space can form a feedback loop with thermal subsystem too to ensure the with *the* thermal subsystem > +device doesn't heat up to the point where it will throttle. > + > +Both SCHED_NORMAL/OTHER and SCHED_FIFO/RR honour uclamp requests/hints. > + > +In SCHED_FIFO/RR case, uclamp gives the option to run RT tasks at any In *the* SCHED_FIFO... > +performance point rather than being tied to MAX frequency all the time. Which > +can be useful on general purpose systems that run on battery powered devices. > + > +Note that by design RT tasks don't have per-task PELT signal and must always > +run at a constant frequency to combat undeterministic DVFS rampup delays. > + > +Note that using schedutil always implies a single delay to modify the frequency > +when an RT task wakes up. This cost is unchanged by using uclamp. Uclamp only > +helps picking what frequency to request instead of schedutil always requesting > +MAX for all RT tasks. > + > +See section 3.4 for default values and 3.4.1 on how to change RT tasks default > +value. > + > +2. Design > +========== > + > +Util clamp is a property of every task in the system. It sets the boundaries of > +its utilization signal; acting as a bias mechanism that influences certain > +decisions within the scheduler. > + > +The actual utilization signal of a task is never clamped in reality. If you > +inspect PELT signals at any point of time you should continue to see them as > +they are intact. Clamping happens only when needed, e.g: when a task wakes up > +and the scheduler needs to select a suitable CPU for it to run on. > + > +Since the goal of util clamp is to allow requesting a minimum and maximum > +performance point for a task to run on, it must be able to influence the > +frequency selection as well as task placement to be most effective. Both of > +which have implications on the utilization value at rq level, which brings us > +to the main design challenge. > + > +When a task wakes up on an rq, the utilization signal of the rq will be > +impacted by the uclamp settings of all the tasks enqueued on it. For example if > +a task requests to run at UTIL_MIN = 512, then the util signal of the rq needs > +to respect this request as well as all other requests from all of the enqueued > +tasks. > + > +To be able to aggregate the util clamp value of all the tasks attached to the > +rq, uclamp must do some housekeeping at every enqueue/dequeue, which is the > +scheduler hot path. Hence care must be taken since any slow down will have > +significant impact on a lot of use cases and could hinder its usability in > +practice. > + > +The way this is handled is by dividing the utilization range into buckets > +(struct uclamp_bucket) which allows us to reduce the search space from every > +task on the rq to only a subset of tasks on the top-most bucket. > + > +When a task is enqueued, we increment a counter in the matching bucket. And on > +dequeue we decrement it. This makes keeping track of the effective uclamp value > +at rq level a lot easier. > + > +As we enqueue and dequeue tasks we keep track of the current effective uclamp > +value of the rq. See section 2.1 for details on how this works. > + > +Later at any path that wants to identify the effective uclamp value of the rq, > +it will simply need to read this effective uclamp value of the rq at that exact > +moment of time it needs to take a decision. > + > +For task placement case, only Energy Aware and Capacity Aware Scheduling > +(EAS/CAS) make use of uclamp for now. This implies heterogeneous systems only. > +When a task wakes up, the scheduler will look at the current effective uclamp > +value of every rq and compare it with the potential new value if the task were > +to be enqueued there. Favoring the rq that will end up with the most energy > +efficient combination. > + > +Similarly in schedutil, when it needs to make a frequency update it will look > +at the current effective uclamp value of the rq which is influenced by the set > +of tasks currently enqueued there and select the appropriate frequency that > +will honour uclamp requests. > + > +Other paths like setting overutilization state (which effectively disables EAS) > +make use of uclamp as well. Such cases are considered necessary housekeeping to > +allow the 2 main use cases above and will not be covered in detail here as they > +could change with implementation details. > + > +2.1 Buckets > +------------ > + > +:: > + > + [struct rq] > + > + (bottom) (top) > + > + 0 1024 > + | | > + +-----------+-----------+-----------+---- ----+-----------+ > + | Bucket 0 | Bucket 1 | Bucket 2 | ... | Bucket N | > + +-----------+-----------+-----------+---- ----+-----------+ > + : : : > + +- p0 +- p3 +- p4 > + : : > + +- p1 +- p5 > + : > + +- p2 > + > + > +.. note:: > + The diagram above is an illustration rather than a true depiction of the > + internal data structure. > + > +To reduce the search space when trying to decide the effective uclamp value of > +an rq as tasks are enqueued/dequeued, the whole utilization range is divided > +into N buckets where N is configured at compile time by setting > +CONFIG_UCLAMP_BUCKETS_COUNT. By default it is set to 5. > + > +The rq has a bucket for each uclamp_id: [UCLAMP_MIN, UCLAMP_MAX]. > + > +The range of each bucket is 1024/N. For example for the default value of 5 we > +will have 5 buckets, each of which will cover the following range: > + > +.. code-block:: c If you want to minimize markup, you could use basic literal blocks rather than ..code-block::, which really only has the effect of syntax coloring in HTML output. I don't find that worth the extra clutter myself, but others clearly disagree with me... > + DELTA = round_closest(1024/5) = 204.8 = 205 > + > + Bucket 0: [0:204] > + Bucket 1: [205:409] > + Bucket 2: [410:614] > + Bucket 3: [615:819] > + Bucket 4: [820:1024] > + I didn't find anything worth quibbling about after this. Thanks, jon