Re: [PATCH 09/39] sched: Add @reason to sched_class->rq_{on|off}line()

Tejun Heo <tj@xxxxxxxxxx> · Tue, 25 Jun 2024 13:41:01 -1000

Hello,

On Tue, Jun 25, 2024 at 10:29:26AM +0200, Peter Zijlstra wrote:
...
> > Taking a step back to the sched domains. They don't translate well to
> > sched_ext schedulers where task to CPU associations are often more dynamic
> > (e.g. multiple CPUs sharing a task queue) and load balancing operations can
> > be implemented pretty differently from CFS. The benefits of exposing sched
> > domains directly to the BPF schedulers is unclear as most of relevant
> > information can be obtained from userspace already.
> 
> Either which way around you want to turn it, you must not violate
> partitions. If a bpf thing isn't capable of handling partitions, you
> must refuse loading it when a partition exists and equally disallow
> creation of partitions when it does load.
> 
> For partitions specifically, you only need the root_domain, not the full
> sched_domain trees.
> 
> I'm aware you have these shared runqueues, but you don't *have* to do
> that. Esp. so if the user explicitly requested partitions.

As a quick work around, I can just disallow / eject the BPF scheduler when
partitioning is configured. However, I think I'm still missing something and
would appreciate if you can fill me in.

Abiding by core scheduling configuration is critical because it has direct
user visible and security implications and this can be tested from userspace
- are two threads which shouldn't be on the same core on the same core or
not? So, the violation condition is pretty clear.

However, I'm not sure how partioning is similar. My understanding is that it
works as a barrier for the load balancer. LB on this side can't look there
and LB on that side can't look here. However, isn't the impact purely
performance / isolation difference? IOW, let's say you laod a BPF scheduler
which consumes the partition information but doesn't do anything differently
based on it. cpumasks are still enforced the same and I can't think of
anything which userspace would be able to test to tell whether partitioning
is working or not.

If the only difference partitions make is on performance. While it would
make sense to communicate partitions to the BPF scheduler, would it make
sense to reject BPF scheduler based on it? ie. Assuming that the feature is
implemented, what would distinguish between one BPF scheduler which handles
partitions specially and the other which doesn't care?

Thanks.

-- 
tejun