On 2022/1/12 22:43, Peter Zijlstra wrote:
On Mon, Dec 06, 2021 at 10:45:28AM +0800, Gang Li wrote:
This patch add a new api PR_NUMA_BALANCING in prctl.
A large number of page faults will cause performance loss when numa
balancing is performing. Thus those processes which care about worst-case
performance need numa balancing disabled. Others, on the contrary, allow a
temporary performance loss in exchange for higher average performance, so
enable numa balancing is better for them.
Numa balancing can only be controlled globally by
/proc/sys/kernel/numa_balancing. Due to the above case, we want to
disable/enable numa_balancing per-process instead.
Add numa_balancing under mm_struct. Then use it in task_tick_fair.
Set per-process numa balancing:
prctl(PR_NUMA_BALANCING, PR_SET_NUMAB_DISABLE); //disable
prctl(PR_NUMA_BALANCING, PR_SET_NUMAB_ENABLE); //enable
prctl(PR_NUMA_BALANCING, PR_SET_NUMAB_DEFAULT); //follow global
This seems to imply you can prctl(ENABLE) even if the global is
disabled, IOW sched_numa_balancing is off.
Of course, this semantic has been discussed here FYI.
https://lore.kernel.org/all/20211118085819.GD3301@xxxxxxx/
On 11/18/21 4:58 PM, Mel Gorman wrote:
> On Thu, Nov 18, 2021 at 11:26:30AM +0800, Gang Li wrote:
>> 3. prctl(PR_NUMA_BALANCING, PR_SET_NUMAB_ENABLE); //enable
>
> If PR_SET_NUMAB_ENABLE enables numa balancing for a task when
> kernel.numa_balancing == 0 instead of returning an error then sure.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 884f29d07963..2980f33ac61f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11169,8 +11169,12 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
entity_tick(cfs_rq, se, queued);
}
- if (static_branch_unlikely(&sched_numa_balancing))
+#ifdef CONFIG_NUMA_BALANCING
+ if (curr->mm && (curr->mm->numab_enabled == NUMAB_ENABLED
+ || (static_branch_unlikely(&sched_numa_balancing)
+ && curr->mm->numab_enabled == NUMAB_DEFAULT)))
task_tick_numa(rq, curr);
+#endif
update_misfit_status(curr, rq);
update_overutilized_status(task_rq(curr));
There's just about everything wrong there... not least of all the
horrific coding style.
horrible code, yes.
I'll do some code clean.
--
Thanks
Gang Li