Excerpts from Nicholas Piggin's message of June 5, 2021 10:17 am: > Excerpts from Andy Lutomirski's message of June 5, 2021 3:05 am: >> On 6/4/21 9:54 AM, Andy Lutomirski wrote: >>> On 5/31/21 11:22 PM, Nicholas Piggin wrote: >>>> There haven't been objections to the series since last posting, this >>>> is just a rebase and tidies up a few comments minor patch rearranging. >>>> >>> >>> I continue to object to having too many modes. I like my more generic >>> improvements better. Let me try to find some time to email again. >>> >> >> Specifically, this: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm > > That's worse than what powerpc does with the shoot lazies code so > we wouldn't use it anyway. > > The fact is mm-cpumask and lazy mm is very architecture specific, so I > don't really see that another "mode" is such a problem, it's for the > most part "this is what powerpc does" -> "this is what powerpc does". > The only mode in the context switch is just "take a ref on the lazy mm" > or "don't take a ref". Surely that's not too onerous to add!? > > Actually the bigger part of it is actually the no-lazy mmu mode which > is not yet used, I thought it was a neat little demonstrator of how code > works with/without lazy but I will get rid of that for submission. I admit that does add a bit more churn than necessary maybe that was the main objection. Here is the entire kernel/sched/core.c change after that is removed. Pretty simple now. I'll resubmit. Thanks, Nick diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e359c76ea2e2..1be0b97e12ec 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4171,7 +4171,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) __releases(rq->lock) { struct rq *rq = this_rq(); - struct mm_struct *mm = rq->prev_mm; + struct mm_struct *mm = NULL; long prev_state; /* @@ -4190,7 +4190,10 @@ static struct rq *finish_task_switch(struct task_struct *prev) current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); - rq->prev_mm = NULL; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + mm = rq->prev_lazy_mm; + rq->prev_lazy_mm = NULL; +#endif /* * A task struct has one reference for the use as "current". @@ -4326,9 +4329,21 @@ context_switch(struct rq *rq, struct task_struct *prev, switch_mm_irqs_off(prev->active_mm, next->mm, next); if (!prev->mm) { // from kernel - /* will mmdrop_lazy_tlb() in finish_task_switch(). */ - rq->prev_mm = prev->active_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + /* Will mmdrop_lazy_tlb() in finish_task_switch(). */ + rq->prev_lazy_mm = prev->active_mm; prev->active_mm = NULL; +#else + /* + * Without MMU_LAZY_TLB_REFCOUNT there is no lazy + * tracking (because no rq->prev_lazy_mm) in + * finish_task_switch, so no mmdrop_lazy_tlb(), + * so no memory barrier for membarrier (see the + * membarrier comment in finish_task_switch()). + * Do it here. + */ + smp_mb(); +#endif } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a189bec13729..0729cf19a987 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -961,7 +961,9 @@ struct rq { struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; - struct mm_struct *prev_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + struct mm_struct *prev_lazy_mm; +#endif unsigned int clock_update_flags; u64 clock;