On Mon, Jun 24, 2024 at 11:18:06AM -1000, Tejun Heo wrote: > Hello, Peter. > > On Mon, Jun 24, 2024 at 01:32:12PM +0200, Peter Zijlstra wrote: > > On Wed, May 01, 2024 at 05:09:44AM -1000, Tejun Heo wrote: > > > ->rq_{on|off}line are called either during CPU hotplug or cpuset partition > > > updates. A planned BPF extensible sched_class wants to tell the BPF > > > scheduler progs about CPU hotplug events in a way that's synchronized with > > > rq state changes. > > > > > > As the BPF scheduler progs aren't necessarily affected by cpuset partition > > > updates, we need a way to distinguish the two types of events. Let's add an > > > argument to tell them apart. > > > > That would be a bug. Must not be able to ignore partitions. > > So, first of all, this implementation was brittle in assuming CPU hotplug > events would be called in first and broke after recent cpuset changes. In > v7, it's replaced by hooks in sched_cpu_[de]activate(), which has the extra > benefit of allowing the BPF hotplug methods to be sleepable. Urgh, I suppose I should go stare at v7 then. > Taking a step back to the sched domains. They don't translate well to > sched_ext schedulers where task to CPU associations are often more dynamic > (e.g. multiple CPUs sharing a task queue) and load balancing operations can > be implemented pretty differently from CFS. The benefits of exposing sched > domains directly to the BPF schedulers is unclear as most of relevant > information can be obtained from userspace already. Either which way around you want to turn it, you must not violate partitions. If a bpf thing isn't capable of handling partitions, you must refuse loading it when a partition exists and equally disallow creation of partitions when it does load. For partitions specifically, you only need the root_domain, not the full sched_domain trees. I'm aware you have these shared runqueues, but you don't *have* to do that. Esp. so if the user explicitly requested partitions.