One of the last ports of call before rescheduling is triggered is resched_curr(). It's task is to set TIF_NEED_RESCHED and, if running locally, either fold it in the preempt_count, or send a resched-IPI so the target CPU folds it in. To handle TIF_NEED_RESCHED_LAZY -- since the reschedule is not imminent -- it only needs to set the appropriate bit. Move all of underlying mechanism in __resched_curr(). And, define resched_curr() which handles the policy on when we want to set which need-resched variant. For now the approach is to run to completion (TIF_NEED_RESCHED_LAZY) with the following exceptions where we always want to reschedule at the next preemptible point (TIF_NEED_RESCHED): - idle: if we are polling in idle, then set_nr_if_polling() will do the right thing. When not polling, we force TIF_NEED_RESCHED and send a resched-IPI if needed. - the target CPU is in userspace: run to completion semantics are only for kernel tasks - running under the full preemption model Originally-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx> --- kernel/sched/core.c | 80 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 70 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 01df5ac2982c..f65bf3ce0e9d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1027,13 +1027,13 @@ void wake_up_q(struct wake_q_head *head) } /* - * resched_curr - mark rq's current task 'to be rescheduled now'. + * __resched_curr - mark rq's current task 'to be rescheduled'. * - * On UP this means the setting of the need_resched flag, on SMP it - * might also involve a cross-CPU call to trigger the scheduler on - * the target CPU. + * On UP this means the setting of the need_resched flag, on SMP, for + * eager resched it might also involve a cross-CPU call to trigger + * the scheduler on the target CPU. */ -void resched_curr(struct rq *rq) +void __resched_curr(struct rq *rq, resched_t rs) { struct task_struct *curr = rq->curr; int cpu; @@ -1046,17 +1046,77 @@ void resched_curr(struct rq *rq) cpu = cpu_of(rq); if (cpu == smp_processor_id()) { - set_tsk_need_resched(curr, RESCHED_eager); - set_preempt_need_resched(); + set_tsk_need_resched(curr, rs); + if (rs == RESCHED_eager) + set_preempt_need_resched(); return; } - if (set_nr_and_not_polling(curr, RESCHED_eager)) - smp_send_reschedule(cpu); - else + if (set_nr_and_not_polling(curr, rs)) { + if (rs == RESCHED_eager) + smp_send_reschedule(cpu); + } else if (rs == RESCHED_eager) trace_sched_wake_idle_without_ipi(cpu); } +/* + * resched_curr - mark rq's current task 'to be rescheduled' eagerly + * or lazily according to the current policy. + * + * Always schedule eagerly, if: + * + * - running under full preemption + * + * - idle: when not polling (or if we don't have TIF_POLLING_NRFLAG) + * force TIF_NEED_RESCHED to be set and send a resched IPI. + * (the polling case has already set TIF_NEED_RESCHED via + * set_nr_if_polling()). + * + * - in userspace: run to completion semantics are only for kernel tasks + * + * Otherwise (regardless of priority), run to completion. + */ +void resched_curr(struct rq *rq) +{ + resched_t rs = RESCHED_lazy; + int context; + + if (IS_ENABLED(CONFIG_PREEMPT) || + (rq->curr->sched_class == &idle_sched_class)) { + rs = RESCHED_eager; + goto resched; + } + + /* + * We might race with the target CPU while checking its ct_state: + * + * 1. The task might have just entered the kernel, but has not yet + * called user_exit(). We will see stale state (CONTEXT_USER) and + * send an unnecessary resched-IPI. + * + * 2. The user task is through with exit_to_user_mode_loop() but has + * not yet called user_enter(). + * + * We'll see the thread's state as CONTEXT_KERNEL and will try to + * schedule it lazily. There's obviously nothing that will handle + * this need-resched bit until the thread enters the kernel next. + * + * The scheduler will still do tick accounting, but a potentially + * higher priority task waited to be scheduled for a user tick, + * instead of execution time in the kernel. + */ + context = ct_state_cpu(cpu_of(rq)); + if ((context == CONTEXT_USER) || + (context == CONTEXT_GUEST)) { + + rs = RESCHED_eager; + goto resched; + } + +resched: + __resched_curr(rq, rs); +} + void resched_cpu(int cpu) { struct rq *rq = cpu_rq(cpu); -- 2.31.1