On Thu, Mar 25, 2021 at 02:58:52PM -0700, Josh Don wrote: > > On Wed, Mar 24, 2021 at 01:39:16PM +0000, Mel Gorman wrote: > > I'm not going to NAK because I do not have hard data that shows they must > > exist. However, I won't ACK either because I bet a lot of tasty beverages > > the next time we meet that the following parameters will generate reports > > if removed. > > > > kernel.sched_latency_ns > > kernel.sched_migration_cost_ns > > kernel.sched_min_granularity_ns > > kernel.sched_wakeup_granularity_ns > > > > I know they are altered by tuned for different profiles and some people do > > go the effort to create custom profiles for specific applications. They > > also show up in "Official Benchmarking" such as SPEC CPU 2017 and > > some vendors put a *lot* of effort into SPEC CPU results for bragging > > rights. They show up in technical books and best practice guids for > > applications. Finally they show up in Google when searching for "tuning > > sched_foo". I'm not saying that any of these are even accurate or a good > > idea, just that they show up near the top of the results and they are > > sufficiently popular that they might as well be an ABI. > > +1, these seem like sufficiently well-known scheduler tunables, and > not really SCHED_DEBUG. So we've never made any guarantees on their behaviour, nor am I willing to make any. In fact, I propose we merge the below along with the debugfs move. Just to make absolutely sure any 'tuning' is broken. --- Subject: sched,fair: Alternative sched_slice() From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Date: Thu Mar 25 13:44:46 CET 2021 The current sched_slice() seems to have issues; there's two possible things that could be improved: - the 'nr_running' used for __sched_period() is daft when cgroups are considered. Using the RQ wide h_nr_running seems like a much more consistent number. - (esp) cgroups can slice it real fine (pun intendend), which makes for easy over-scheduling, ensure min_gran is what the name says. Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> --- kernel/sched/fair.c | 15 ++++++++++++++- kernel/sched/features.h | 3 +++ 2 files changed, 17 insertions(+), 1 deletion(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -680,7 +680,16 @@ static u64 __sched_period(unsigned long */ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se) { - u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq); + unsigned int nr_running = cfs_rq->nr_running; + u64 slice; + + if (sched_feat(ALT_PERIOD)) + nr_running = rq_of(cfs_rq)->cfs.h_nr_running; + + slice = __sched_period(nr_running + !se->on_rq); + + if (sched_feat(BASE_SLICE)) + slice -= sysctl_sched_min_granularity; for_each_sched_entity(se) { struct load_weight *load; @@ -697,6 +706,10 @@ static u64 sched_slice(struct cfs_rq *cf } slice = __calc_delta(slice, se->load.weight, load); } + + if (sched_feat(BASE_SLICE)) + slice += sysctl_sched_min_granularity; + return slice; } --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -90,3 +90,6 @@ SCHED_FEAT(WA_BIAS, true) */ SCHED_FEAT(UTIL_EST, true) SCHED_FEAT(UTIL_EST_FASTUP, true) + +SCHED_FEAT(ALT_PERIOD, true) +SCHED_FEAT(BASE_SLICE, true)