On Tue, Oct 15, 2024 at 04:41:26AM -1000, Tejun Heo wrote: > Hello, Andrea. > > On Tue, Oct 15, 2024 at 01:15:39PM +0200, Andrea Righi wrote: > ... > > For example, a BPF scheduler might use logic like the following to keep > > the CPU active under specific conditions: > > > > void BPF_STRUCT_OPS(sched_update_idle, s32 cpu, bool idle) > > { > > if (!idle) > > return; > > if (condition) > > scx_bpf_kick_cpu(cpu, 0); > > } > > > > A call to scx_bpf_kick_cpu() wakes up the CPU, so in theory, > > ops.update_idle() should be triggered again until the condition becomes > > false. However, this doesn't happen, and scx_bpf_kick_cpu() doesn't > > produce the expected effect. > > I thought more about this scenario and I'm not sure anymore whether we want > to guarantee that scx_bpf_kick_cpu() is followed by update_idle(cpu, true). > Here are a couple considerations: > > - As implemented, the transtions aren't balanced. ie. When the above > happens, update_idle(cpu, true) will be generated multiple times without > intervening update_idle(cpu, false). We can insert artificial false > transtions but that's cumbersome and... Agreed, I wouldn't suggest adding artificial false events. > > - For the purpose of determining whether a CPU is idle for e.g. task > placement from ops.select_cpu(). The CPU *should* be considered idle in > this polling state. > > Overall, it feels a bit contrived to generate update_idle() events > consecutively for this. If a scheduler wants to poll in idle state, can't it > do something like the following? > > - Trigger kick from update_idle(cpu, true) and remember that the CPU is in > the polling state. > > - Keep kicking from ops.dispatch() until polling state is cleared. > > As what kick() guarnatees is at least one dispatch event after kicking, this > is guaranteed to be correct and the control flow, while a bit more > complicated, makes sense - it triggers dispatch on idle transition and keeps > dispatching in the idle state. > > What do you think? That seems to work in theory, I'll run some tests to confirm that it also works in practice. :) It looks definitely nicer than triggering multiple ops.update_idle() from the kernel and we can maintain the proper semantic of triggering update_idle() only on actual idle state changes. Thanks, -Andrea