Hello, On Thu, Oct 10, 2024 at 09:12:19PM -0700, Yonghong Song wrote: > > Let's get priv_stack in shape first (the first ~6 patches). > > I am okay to focus on the first 6 patches. But I would like to get > Tejun's comments about what is the best way to support hierarchical > bpf based scheduler. There isn't a concrete design yet, so it's difficult to say anything definitive but I was thinking more along the line of providing sched_ext kfunc helpers that perform nesting calls rather than each BPF program directly calling nested BPF programs. For example, let's say the scheduler hierarchy looks like this: R + A + AA | + AB + B Let's say AB has a task waking up to it and is calling ops.select_cpu(): ops.select_cpu() { if (does AB already have the perfect CPU sitting around) direct dispatch and return the CPU; if (scx_bpf_get_cpus(describe the perfect CPU)) direct dispatch and return the CPU; if (is there any eligible idle CPU that AB is holding) direct dispatch and return the CPU; if (scx_bpf_get_cpus(any eligible CPUs)) direct dispatch and return the CPU; // no idle CPU, proceed to enqueue return prev_cpu; } Note that the scheduler at AB doesn't have any knowledge of what's up the tree. It's just describing what it wants through the kfunc which is then responsible for nesting calls up the hierarhcy. Up a layer, this can be implemented like: ops.get_cpus(CPUs description) { if (has any CPUs matching the description) claim and return the CPUs; modify CPUs description to enforce e.g. cache sharing policy; and possibly to request more CPUs for batching; if (scx_bpf_get_cpus(CPUs description)) { store extra CPUs; claim and return some of the CPUs; } return no CPUs available; } This way, the schedulers at different layers are isolated and each only has to express what it wants. Thanks. -- tejun