Hello, On Mon, Jan 30, 2023 at 01:38:15PM -0800, Josh Don wrote: > > The core-sched support is composed of the following parts: > > Thanks, this looks pretty reasonable overall. > > One meta comment is that I think we can shortcircuit from > touch_core_sched when we have sched_core_disabled(). Yeah, touch_core_sched() is really cheap (it's just an assignment from an rq field to a task field) but sched_core_disabled() is also just a static branch. Will update. > Reviewed-by: Josh Don <joshdon@xxxxxxxxxx> > > > + /* > > + * While core-scheduling, rq lock is shared among > > + * siblings but the debug annotations and rq clock > > + * aren't. Do pinning dance to transfer the ownership. > > + */ > > + WARN_ON_ONCE(__rq_lockp(rq) != __rq_lockp(srq)); > > + rq_unpin_lock(rq, rf); > > + rq_pin_lock(srq, &srf); > > + > > + update_rq_clock(srq); > > Unfortunate that we have to do this superfluous update; maybe we can > save/restore the clock flags from before the pinning shenanigans? So, this one isn't really superflous. There are two rq's involved - self and sibling. self's rq clock is saved and restored through rq_unpin_lock() and rq_repin_lock(). We're transferring the lock owner ship from self to sibling without actually unlocking and relocking the lock as they should be sharing the same lock; however, that doesn't mean that the two queues share rq clocks, so the sibling needs to update its rq clock upon getting the lock transferred to it. It might make sense to make the siblings share the rq clock when core-sched is enabled but that's probably for some other time. > > +static struct task_struct *pick_task_scx(struct rq *rq) > > +{ > > + struct task_struct *curr = rq->curr; > > + struct task_struct *first = first_local_task(rq); > > + > > + if (curr->scx.flags & SCX_TASK_QUEUED) { > > + /* is curr the only runnable task? */ > > + if (!first) > > + return curr; > > + > > + /* > > + * Does curr trump first? We can always go by core_sched_at for > > + * this comparison as it represents global FIFO ordering when > > + * the default core-sched ordering is in used and local-DSQ FIFO > > + * ordering otherwise. > > + */ > > + if (curr->scx.slice && time_before64(curr->scx.core_sched_at, > > + first->scx.core_sched_at)) > > + return curr; > > So is this to handle the case where we have something running on 'rq' > to match the cookie of our sibling (which had priority), but now we > want to switch to running the first thing in the local queue, which > has a different cookie (and is now the highest priority entity)? Maybe > being slightly more specific in the comment would help :) pick_task_scx() is to pick the next best candidate for the rq. The candidates then compete to determine the next cookie. Here, as long as only one task gets dispatched at any given time, the condition check shouldn't really trigger - ie. if curr has slice remaining, balance_one() wouldn't have populated the local DSQ anyway and first would be NULL. However, the BPF scheduler is free to dispatch whatever tasks anytime (e.g. scx_example_central), so it's possible that a task with an earlier timestamp has been dispatched to the local DSQ since curr started executing, in which case we likely want to return the first on DSQ as the CPU's candidate. IOW, the time_before() is there mostly to cover unusual cases. Most should either trigger !first before or fail the curr->scx.slice test. Will update the comment to clarify. Thanks. -- tejun