Hello, Peter. On Mon, Jun 24, 2024 at 01:32:12PM +0200, Peter Zijlstra wrote: > On Wed, May 01, 2024 at 05:09:44AM -1000, Tejun Heo wrote: > > ->rq_{on|off}line are called either during CPU hotplug or cpuset partition > > updates. A planned BPF extensible sched_class wants to tell the BPF > > scheduler progs about CPU hotplug events in a way that's synchronized with > > rq state changes. > > > > As the BPF scheduler progs aren't necessarily affected by cpuset partition > > updates, we need a way to distinguish the two types of events. Let's add an > > argument to tell them apart. > > That would be a bug. Must not be able to ignore partitions. So, first of all, this implementation was brittle in assuming CPU hotplug events would be called in first and broke after recent cpuset changes. In v7, it's replaced by hooks in sched_cpu_[de]activate(), which has the extra benefit of allowing the BPF hotplug methods to be sleepable. Taking a step back to the sched domains. They don't translate well to sched_ext schedulers where task to CPU associations are often more dynamic (e.g. multiple CPUs sharing a task queue) and load balancing operations can be implemented pretty differently from CFS. The benefits of exposing sched domains directly to the BPF schedulers is unclear as most of relevant information can be obtained from userspace already. The cgroup support side isn't fully developed yet (e.g. cpu.weight is available but I haven't added cpu.max yet) and plans can always change but I was thinking taking a similar approach as cpu.weight for cpuset's isolation features - ie. give the BPF scheduler a way to access the user's configuration and let it implement whatever it wants to implement. Thanks. -- tejun