[Scheduler] CFS - What happens to each task's slice if nr_running * min_granularity > sched_latency?

Evan T Mesterhazy <etm2131@xxxxxxxxxxxx> · Thu, 2 Apr 2020 22:10:11 -0400

Hi everyone ~

I've read a few different references on CFS and have been looking through fair.c at the CFS scheduler code. One thing I'm not completely understanding is what happens when too many processes are running such that nr_running * min_granularity > sched_latency

I know that the scheduler will expand the period so that each process can run for at least the min_granularity, but how does that interact with nice numbers? Here's the code for expanding the period:

/*
 * The idea is to set a period in which each task runs once.
 *
 * When there are too many tasks (sched_nr_latency) we have to stretch
 * this period because otherwise the slices get too small.
 *
 * p = (nr <= nl) ? l : l*nr/nl
 */
static u64 __sched_period(unsigned long nr_running)
{
        if (unlikely(nr_running > sched_nr_latency))
                return nr_running * sysctl_sched_min_granularity;
        else
                return sysctl_sched_latency;
}

Here's the code for calculating an individual process's slice. It looks like the weighting formula is used here regardless of whether the period has been expanded.
If that's the case, doesn't that mean that some processes will still get a slice that's smaller than the min_granularity?
/*
 * We calculate the wall-time slice from the period by taking a part
 * proportional to the weight.
 *
 * s = p*P[w/rw]
 */
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
        u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);

        for_each_sched_entity(se) {
                struct load_weight *load;
                struct load_weight lw;

                cfs_rq = cfs_rq_of(se);
                load = &cfs_rq->load;

                if (unlikely(!se->on_rq)) {
                        lw = cfs_rq->load;

                        update_load_add(&lw, se->load.weight);
                        load = &lw;
                }
                slice = __calc_delta(slice, se->load.weight, load);
        }
        return slice;
}

I ran a test by starting five busy processes with a nice level of -10. Next, I launched ~40 busy processes with a nice level of 0 (all procs were set to use the same CPU). I expected CFS to expand the period and assign each process a slice equal to the min granularity. However, the 5 processes with nice = -10 still used considerably more CPU than the other processes. 

Is __calc_delta in the function above actually expanding the slice further based on the nice weighting of the tasks? The __calc_delta function is a bit difficult to follow, so I haven't quite figured out what it's doing.

tl;dr I know that CFS expands the period if lots of processes are running. What I'm not sure about is how nice levels affect the slice each task gets if the period has been expanded due to a high number of running tasks.

Thanks!
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies