Hi Harshit, Thanks for this! On 07/03/25 20:42, Harshit Agarwal wrote: > This fix is the deadline version of the change made to the rt scheduler > here: > https://lore.kernel.org/lkml/20250225180553.167995-1-harshit@xxxxxxxxxxx/ > Please go through the original change for more details on the issue. I don't think we want this kind of URLs in the changelog, as URL might disappear while the history remains (at least usually a little longer :). Maybe you could add a very condensed version of the description of the problem you have on the other fix? > In this fix we bail out or retry in the push_dl_task, if the task is no > longer at the head of pushable tasks list because this list changed > while trying to lock the runqueue of the other CPU. > > Signed-off-by: Harshit Agarwal <harshit@xxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > kernel/sched/deadline.c | 25 +++++++++++++++++++++---- > 1 file changed, 21 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index 38e4537790af..c5048969c640 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -2704,6 +2704,7 @@ static int push_dl_task(struct rq *rq) > { > struct task_struct *next_task; > struct rq *later_rq; > + struct task_struct *task; > int ret = 0; > > next_task = pick_next_pushable_dl_task(rq); > @@ -2734,15 +2735,30 @@ static int push_dl_task(struct rq *rq) > > /* Will lock the rq it'll find */ > later_rq = find_lock_later_rq(next_task, rq); > - if (!later_rq) { > - struct task_struct *task; > + task = pick_next_pushable_dl_task(rq); > + if (later_rq && (!task || task != next_task)) { > + /* > + * We must check all this again, since > + * find_lock_later_rq releases rq->lock and it is > + * then possible that next_task has migrated and > + * is no longer at the head of the pushable list. > + */ > + double_unlock_balance(rq, later_rq); > + if (!task) { > + /* No more tasks */ > + goto out; > + } > > + put_task_struct(next_task); > + next_task = task; > + goto retry; I fear we might hit a pathological condition that can lead us into a never ending (or very long) loop. find_lock_later_rq() tries to find a later_rq for at most DL_MAX_TRIES and it bails out if it can't. Maybe to discern between find_lock_later_rq() callers we can use dl_throttled flag in dl_se and still implement the fix in find_lock_ later_rq()? I.e., fix similar to the rt.c patch in case the task is not throttled (so caller is push_dl_task()) and not rely on pick_next_ pushable_dl_task() if the task is throttled. What do you think? Thanks, Juri