On 12/07/24 19:43, Peter Zijlstra wrote: > On Thu, Jul 11, 2024 at 03:00:04PM +0200, Valentin Schneider wrote: > >> +static void throttle_one_task(struct cfs_rq *cfs_rq, struct task_struct *p) >> { >> + long task_delta, idle_task_delta; >> + struct sched_entity *se = &p->se; >> + >> + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); >> >> + task_delta = 1; >> + idle_task_delta = cfs_rq_is_idle(cfs_rq) ? 1 : 0; >> + >> + for_each_sched_entity(se) { >> + cfs_rq = cfs_rq_of(se); >> + >> + if (!se->on_rq) >> + return; >> + >> + dequeue_entity(cfs_rq, se, DEQUEUE_SLEEP); >> + cfs_rq->h_nr_running -= task_delta; >> + cfs_rq->idle_h_nr_running -= idle_task_delta; >> + >> + if (cfs_rq->load.weight) { >> + /* Avoid re-evaluating load for this entity: */ >> + se = parent_entity(se); >> + break; >> + } >> + } >> + >> + for_each_sched_entity(se) { >> + cfs_rq = cfs_rq_of(se); >> + /* throttled entity or throttle-on-deactivate */ >> + if (!se->on_rq) >> + goto throttle_done; >> + >> + update_load_avg(cfs_rq, se, 0); >> + se_update_runnable(se); >> + cfs_rq->h_nr_running -= task_delta; >> + cfs_rq->h_nr_running -= idle_task_delta; >> + } >> + >> +throttle_done: >> + /* At this point se is NULL and we are at root level*/ >> + sub_nr_running(rq_of(cfs_rq), 1); >> } > > I know you're just moving code around, but we should look if we can > share code between this and dequeue_task_fair(). > > I have patches around this in that eevdf series I should send out again, > I'll try and have a stab. > Looking at this again, couldn't this actually just be: list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); dequeue_task_fair(rq, p, DEQUEUE_SLEEP); ? Because the main deltas between throttle_one_task() and dequeue_task_fair() are (other than the list_add()): o Caring about throttled entities (!se->on_rq) - which AFAICT can't happen after this patch, since non-empty cfs_rq's are kept enqueued o Load tracking faff (util_est, an extra update_cfs_group()) o The set_next_buddy() thing, which will always be a nop because of throttled_hierarchy() o A rq->next_balance update o hrtick_update() I think some of these are omitted from throttle_cfs_rq() because the whole cfs_rq is being taken out, but here we are genuinely taking just one task out, so I think this does want to be pretty much dequeue_task_fair().