Re: [PATCH 3/6] sched_ext: idle: Introduce the concept of allowed CPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 07, 2025 at 12:17:23PM -1000, Tejun Heo wrote:
> Hello,
> 
> On Fri, Mar 07, 2025 at 09:01:05PM +0100, Andrea Righi wrote:
> > Many scx schedulers define their own concept of scheduling domains to
> > represent topology characteristics, such as heterogeneous architectures
> 
> I'm not sure "domain" is a good choice given that sched_domain is already an
> established construct in kernel and means something specific.

Yeah, I agree, we don't want to create ambiguity with sched_domain.
How about CPU groups or CPU partitions?

> 
> > (e.g., big.LITTLE, P-cores/E-cores), or to categorize tasks based on
> > specific properties (e.g., setting the soft-affinity of certain tasks to
> > a subset of CPUs).
> > 
> > Currently, there is no mechanism to share these domains with the
> > built-in idle CPU selection policy. As a result, schedulers often
> > implement their own idle CPU selection policies, which are typically
> > similar to one another, leading to a lot of code duplication.
> > 
> > To address this, introduce the concept of allowed domain (represented as
> > a cpumask) that can be used by the BPF schedulers to apply the built-in
> > idle CPU selection policy to a subset of preferred CPUs.
> 
> We don't need a new term here, do we? All that's being added is an extra
> mask when picking CPUs.

Right, at the end it's just a cpumask, I'll rephrase this part.

> 
> > With this concept the idle CPU selection policy becomes the following:
> >  - always prioritize CPUs from fully idle SMT cores (if SMT is enabled),
> >  - select the same CPU if it's idle and in the allowed domain,
> >  - select an idle CPU within the same LLC domain, if the LLC domain is a
> >    subset of the allowed domain,
> 
> Why not select from the intersection of the same LLC domain and the cpumask?

We could do that, but to guarantee the intersection we need to introduce
other temporary cpumasks (one for the LLC intersection and another for the
NUMA), which is not a big problem, but it can introduce overhead. And most
of the time the LLC group is either a subset of the allowed CPUs or
vice-versa, so in this case the current logic already works.

The extra cpumask work is needed only when the allowed cpumask spans
multiple partial LLCs, which should be rare. So maybe in such cases, we
could tolerate the additional overhead of updating an additional temporary
cpumask to ensure proper hierarchical semantics (maintaining consistency
with the topology hierarchy). WDYT?

> 
> >  - select an idle CPU within the same node, if the node domain is a
> >    subset of the allowed domain,
> 
> Ditto.
> 
> >  - select an idle CPU within the allowed domain.
> > 
> > If the allowed domain is empty or NULL, the behavior of the built-in
> > idle CPU selection policy remains unchanged.
> > 
> > This only introduces the core concept of allowed domain. This
> > functionality will be exposed through a dedicated kfunc in a separate
> > patch.
> ...
> > -s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags)
> > +s32 scx_select_cpu_dfl(struct task_struct *p, const struct cpumask *cpus_allowed,
> > +		       s32 prev_cpu, u64 wake_flags, u64 flags)
> 
> Maybe rearrange them (p, prev_cpu, wake_flags, and_cpumask, pick_idle_flags)
> so that the first three args align with select_task_rq() and we don't have
> three consecutive integer arguments? Two back-to-back flag args increase the
> chance of subtle bugs.

Good idea. I even introduced a bug while I was updating the kselftests,
because I switched wake_flags and idle flags... so yeah, will definitely do
that.

Thanks!
-Andrea




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux