Damn, I wrote a reply on the 20th but seems like I never sent it. At least I found it saved somewhere, so I don't have to rewrite it... On 20/11/24 10:03, Peter Zijlstra wrote: > On Tue, Nov 19, 2024 at 04:30:02PM -0800, Chenbo Lu wrote: >> Hello, >> >> I am experiencing a significant performance degradation after >> upgrading my kernel from version 6.6 to 6.8 and would appreciate any >> insights or suggestions. >> >> I am running a high-load simulation system that spawns more than 1000 >> threads and the overall CPU usage is 30%+ . Most of the threads are >> using real-time >> scheduling (SCHED_RR), and the threads of a model are using >> SCHED_DEADLINE. After upgrading the kernel, I noticed that the >> execution time of my model has increased from 4.5ms to 6ms. >> >> What I Have Done So Far: >> 1. I found this [bug >> report](https://bugzilla.kernel.org/show_bug.cgi?id=219366#c7) and >> reverted the commit efa7df3e3bb5da8e6abbe37727417f32a37fba47 mentioned >> in the post. Unfortunately, this did not resolve the issue. >> 2. I performed a git bisect and found that after these two commits >> related to scheduling (RT and deadline) were merged, the problem >> happened. They are 612f769edd06a6e42f7cd72425488e68ddaeef0a, >> 5fe7765997b139e2d922b58359dea181efe618f9 > > And yet you failed to Cc Valentin, the author of said commits :/ > >> After reverting these two commits, the model execution time improved >> to around 5 ms. >> 3. I revert two more commits, and the execution time is back to 4.7ms: >> 63ba8422f876e32ee564ea95da9a7313b13ff0a1, >> efa7df3e3bb5da8e6abbe37727417f32a37fba47 >> >> My questions are: >> 1.Has anyone else experienced similar performance degradation after >> upgrading to kernel 6.8? > > This is 4 kernel releases back, I my memory isn't that long. > >> 2.Can anyone explain why these two commits are causing the problem? I >> am not very familiar with the kernel code and would appreciate any >> insights. > > There might be a race window between setting the tro and sending the > IPI, such that previously the extra IPIs would sooner find the newly > pushable task. > > Valentin, would it make sense to set tro before enqueueing the pushable, > instead of after it? Urgh, those cachelines are beyond cold... /me goes reading Ok yeah I guess we could have this race vs rto_push_irq_work_func() `\ rto_next_cpu() Not sure if that applies to DL too since it doesn't have the PUSH_IPI thing, but anyway - Chenbo, could you please try the below? --- diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index d9d5a702f1a61..270a25335c4bc 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -602,16 +602,16 @@ static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p) WARN_ON_ONCE(!RB_EMPTY_NODE(&p->pushable_dl_tasks)); + if (!rq->dl.overloaded) { + dl_set_overload(rq); + rq->dl.overloaded = 1; + } + leftmost = rb_add_cached(&p->pushable_dl_tasks, &rq->dl.pushable_dl_tasks_root, __pushable_less); if (leftmost) rq->dl.earliest_dl.next = p->dl.deadline; - - if (!rq->dl.overloaded) { - dl_set_overload(rq); - rq->dl.overloaded = 1; - } } static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index bd66a46b06aca..1ea45a45ed657 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -385,6 +385,15 @@ static inline void rt_queue_pull_task(struct rq *rq) static void enqueue_pushable_task(struct rq *rq, struct task_struct *p) { + /* + * Set RTO first so rto_push_irq_work_func() can see @rq as a push + * candidate as early as possible. + */ + if (!rq->rt.overloaded) { + rt_set_overload(rq); + rq->rt.overloaded = 1; + } + plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks); plist_node_init(&p->pushable_tasks, p->prio); plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks); @@ -392,15 +401,15 @@ static void enqueue_pushable_task(struct rq *rq, struct task_struct *p) /* Update the highest prio pushable task */ if (p->prio < rq->rt.highest_prio.next) rq->rt.highest_prio.next = p->prio; - - if (!rq->rt.overloaded) { - rt_set_overload(rq); - rq->rt.overloaded = 1; - } } static void dequeue_pushable_task(struct rq *rq, struct task_struct *p) { + /* + * To match enqueue we should check/unset RTO first, but for that we + * need to pop @p first. This makes this asymmetric wrt enqueue, but + * the worst we can get out of this is an extra useless IPI. + */ plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks); /* Update the new highest prio pushable task */