Re: [PATCH sched_ext/for-6.11] sched, sched_ext: Replace scx_next_task_picked() with sched_class->switch_class()

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 25 Jun 2024 09:49:35 +0200

On Mon, Jun 24, 2024 at 11:01:10AM -1000, Tejun Heo wrote:
> Hello, Peter.
> 
> On Mon, Jun 24, 2024 at 10:59:27AM +0200, Peter Zijlstra wrote:
> > > @@ -5907,7 +5907,10 @@ restart:
> > >  	for_each_active_class(class) {
> > >  		p = class->pick_next_task(rq);
> > >  		if (p) {
> > > -			scx_next_task_picked(rq, p, class);
> > > +			const struct sched_class *prev_class = prev->sched_class;
> > > +
> > > +			if (class != prev_class && prev_class->switch_class)
> > > +				prev_class->switch_class(rq, p);
> > 
> > I would much rather see sched_class::pick_next_task() get an extra
> > argument so that the BPF thing can do what it needs in there and we can
> > avoid this extra code here.
> 
> Hmm... but here, the previous class's ->pick_next_task() might not be called
> at all, so I'm not sure how that'd work. For context, sched_ext is using
> this to tell the BPF scheduler that it lost a CPU to a higher priority class
> (be that RT or CFS) os that the BPF scheduler can respond if necessary (e.g.
> punting tasks that were queued on that CPU somewhere else and so on).
> 
> Imagine a case where a sched_ext task was running but then a RT task wakes
> up on the CPU. We'd enter the scheduling path, RT's pick_next_task() would
> return the new RT task to run. We now need to tell the BPF scheduler that we
> lost the CPU to the RT task but haven't called its pick_next_task() yet.

Bah, I got it backwards indeed. But in this case, don't you also need
something in pick_task() -- the whole core scheduling thing does much
the same.