On Wed, Nov 18, 2020 at 08:48:43PM +0100, Thomas Gleixner wrote: > @@ -4073,6 +4089,7 @@ prepare_task_switch(struct rq *rq, struc > perf_event_task_sched_out(prev, next); > rseq_preempt(prev); > fire_sched_out_preempt_notifiers(prev, next); > + kmap_local_sched_out(); > prepare_task(next); > prepare_arch_switch(next); > } > @@ -4139,6 +4156,7 @@ static struct rq *finish_task_switch(str > finish_lock_switch(rq); > finish_arch_post_lock_switch(); > kcov_finish_switch(current); > + kmap_local_sched_in(); > > fire_sched_in_preempt_notifiers(current); > /* > +void __kmap_local_sched_out(void) > +{ > + struct task_struct *tsk = current; > + pte_t *kmap_pte = kmap_get_pte(); > + int i; > + > + /* Clear kmaps */ > + for (i = 0; i < tsk->kmap_ctrl.idx; i++) { > + } > +} > + > +void __kmap_local_sched_in(void) > +{ > + struct task_struct *tsk = current; > + pte_t *kmap_pte = kmap_get_pte(); > + int i; > + > + /* Restore kmaps */ > + for (i = 0; i < tsk->kmap_ctrl.idx; i++) { > + } > +} So even in the optimal case, this adds an unconditional load of tsk->kmap_ctrl.idx to schedule() (2 misses, one pre and one post). Munging preempt-notifiers behind a static_branch, which in that same optimal case, avoided touching curr->preempt_notifier, resulted in a measurable performance improvement. See commit: 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers") Can we fudge some state in a cacheline we're already touching to avoid this?