Re: [PATCH] sched/cpuset: distribute tasks within affinity masks

Tejun Heo <tj@xxxxxxxxxx> · Wed, 4 Mar 2020 12:11:51 -0500

On Thu, Feb 27, 2020 at 05:01:34PM -0800, Josh Don wrote:
> From: Paul Turner <pjt@xxxxxxxxxx>
> 
> Currently, when updating the affinity of tasks via either cpusets.cpus,
> or, sched_setaffinity(); tasks not currently running within the newly
> specified CPU will be arbitrarily assigned to the first CPU within the
> mask.
> 
> This (particularly in the case that we are restricting masks) can
> result in many tasks being assigned to the first CPUs of their new
> masks.
> 
> This:
>  1) Can induce scheduling delays while the load-balancer has a chance to
>     spread them between their new CPUs.
>  2) Can antogonize a poor load-balancer behavior where it has a
>     difficult time recognizing that a cross-socket imbalance has been
>     forced by an affinity mask.
> 
> With this change, tasks are distributed ~evenly across the new mask.  We
> may intentionally move tasks already running on a CPU within the mask to
> avoid edge cases in which a CPU is already overloaded (or would be
> assigned to more times than is desired).
> 
> We specifically apply this behavior to the following cases:
> - modifying cpuset.cpus
> - when tasks join a cpuset
> - when modifying a task's affinity via sched_setaffinity(2)

Looks fine to me. Peter, what do you think?

Thanks.

-- 
tejun