On Mon, Jun 24, 2024 at 11:01:10AM -1000, Tejun Heo wrote: > Hello, Peter. > > On Mon, Jun 24, 2024 at 10:59:27AM +0200, Peter Zijlstra wrote: > > > @@ -5907,7 +5907,10 @@ restart: > > > for_each_active_class(class) { > > > p = class->pick_next_task(rq); > > > if (p) { > > > - scx_next_task_picked(rq, p, class); > > > + const struct sched_class *prev_class = prev->sched_class; > > > + > > > + if (class != prev_class && prev_class->switch_class) > > > + prev_class->switch_class(rq, p); > > > > I would much rather see sched_class::pick_next_task() get an extra > > argument so that the BPF thing can do what it needs in there and we can > > avoid this extra code here. > > Hmm... but here, the previous class's ->pick_next_task() might not be called > at all, so I'm not sure how that'd work. For context, sched_ext is using > this to tell the BPF scheduler that it lost a CPU to a higher priority class > (be that RT or CFS) os that the BPF scheduler can respond if necessary (e.g. > punting tasks that were queued on that CPU somewhere else and so on). > > Imagine a case where a sched_ext task was running but then a RT task wakes > up on the CPU. We'd enter the scheduling path, RT's pick_next_task() would > return the new RT task to run. We now need to tell the BPF scheduler that we > lost the CPU to the RT task but haven't called its pick_next_task() yet. Bah, I got it backwards indeed. But in this case, don't you also need something in pick_task() -- the whole core scheduling thing does much the same.