Hello, On Tue, Jun 25, 2024 at 10:29:26AM +0200, Peter Zijlstra wrote: ... > > Taking a step back to the sched domains. They don't translate well to > > sched_ext schedulers where task to CPU associations are often more dynamic > > (e.g. multiple CPUs sharing a task queue) and load balancing operations can > > be implemented pretty differently from CFS. The benefits of exposing sched > > domains directly to the BPF schedulers is unclear as most of relevant > > information can be obtained from userspace already. > > Either which way around you want to turn it, you must not violate > partitions. If a bpf thing isn't capable of handling partitions, you > must refuse loading it when a partition exists and equally disallow > creation of partitions when it does load. > > For partitions specifically, you only need the root_domain, not the full > sched_domain trees. > > I'm aware you have these shared runqueues, but you don't *have* to do > that. Esp. so if the user explicitly requested partitions. As a quick work around, I can just disallow / eject the BPF scheduler when partitioning is configured. However, I think I'm still missing something and would appreciate if you can fill me in. Abiding by core scheduling configuration is critical because it has direct user visible and security implications and this can be tested from userspace - are two threads which shouldn't be on the same core on the same core or not? So, the violation condition is pretty clear. However, I'm not sure how partioning is similar. My understanding is that it works as a barrier for the load balancer. LB on this side can't look there and LB on that side can't look here. However, isn't the impact purely performance / isolation difference? IOW, let's say you laod a BPF scheduler which consumes the partition information but doesn't do anything differently based on it. cpumasks are still enforced the same and I can't think of anything which userspace would be able to test to tell whether partitioning is working or not. If the only difference partitions make is on performance. While it would make sense to communicate partitions to the BPF scheduler, would it make sense to reject BPF scheduler based on it? ie. Assuming that the feature is implemented, what would distinguish between one BPF scheduler which handles partitions specially and the other which doesn't care? Thanks. -- tejun