On Tue, 11 Feb 2025 at 04:50, Rik van Riel <riel@xxxxxxxxxxx> wrote: > > On Mon, 2025-02-10 at 16:27 +0100, Brendan Jackman wrote: > > On Thu, 6 Feb 2025 at 05:46, Rik van Riel <riel@xxxxxxxxxxx> wrote: > > > /* Wait for INVLPGB originated by this CPU to complete. */ > > > -static inline void tlbsync(void) > > > +static inline void __tlbsync(void) > > > { > > > - cant_migrate(); > > > > Why does this have to go away? > > I'm not sure the current task in sched_init() has > all the correct bits set to prevent the warning > from firing, but on the flip side it won't have > called INVLPGB yet at that point, so the call to > enter_lazy_tlb() won't actually end up here. > > I'll put it back. Sounds good. FWIW I think if we do run into early-boot code hitting false DEBUG_ATOMIC_SLEEP warnings, the best response might be to update the DEBUG_ATOMIC_SLEEP code. Like maybe there's a more targeted solution but something roughly equivalent to checking if (system_state == SYSTEM_STATE_SCHEDULING) before the warning. > > > @@ -794,6 +825,8 @@ void switch_mm_irqs_off(struct mm_struct > > > *unused, struct mm_struct *next, > > > if (IS_ENABLED(CONFIG_PROVE_LOCKING)) > > > WARN_ON_ONCE(!irqs_disabled()); > > > > > > + tlbsync(); > > > + > > > /* > > > * Verify that CR3 is what we think it is. This will catch > > > * hypothetical buggy code that directly switches to > > > swapper_pg_dir > > > @@ -973,6 +1006,8 @@ void switch_mm_irqs_off(struct mm_struct > > > *unused, struct mm_struct *next, > > > */ > > > void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) > > > { > > > + tlbsync(); > > > + > > > > I have a feeling I'll look stupid for asking this, but why do we need > > this and the one in switch_mm_irqs_off()? > > This is an architectural thing: TLBSYNC waits for > the INVLPGB flushes to finish that were issued > from the same CPU. > > That means if we have pending flushes (from the > pageout code), we need to wait for them at context > switch time, before the task could potentially be > migrated to another CPU. Oh right thanks, that makes sense. So I think here we're encoding the assumption that context_switch() always calls either enter_lazy_tlb() or switch_mm_irqs_off(), which is a little awkward, plus the job of these functions is already kinda hazy and this makes it even hazier. What about doing it in arch_start_context_switch() instead? That would mean a bit of plumbing since we'd still wanna have the tlbsync() in tlb.c, but that seems worth it to me. Plus, having it in one place would give us a spot to add a comment. Now that you point it out it does indeed seem obvious but it didn't seem so yesterday. Now I think about it... if we always tlbsync() before a context switch, is the cant_migrate() above actually required? I think with that, even if we migrated in the middle of e.g. broadcast_kernel_range_flush(), we'd be fine? (At least, from the specific perspective of the invplgb code, presumably having preemption on there would break things horribly in other ways).