> On 20 Jan 2025, at 19:56, Rik van Riel <riel@xxxxxxxxxxx> wrote: > > How would you keep track of CPUs where the tlbsync > has NOT happened before arch_tlbbatch_flush()? > > That part seems to be missing still. You only keep track if there is a pending tlbsync on *your* CPU. No need to track if other CPUs did not issue tlbsync during arch_tlbbatch_add_pending(). If the process that does the reclamation was migrated, a TLBSYNC is issued during the context switch, before that thread that does the reclamation has any chance of being scheduled. I hope this code changes on top of your would make it clearer: > +void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, > + struct mm_struct *mm, > + unsigned long uaddr) > +{ > + if (static_cpu_has(X86_FEATURE_INVLPGB) && mm_global_asid(mm)) { > + u16 asid = mm_global_asid(mm); > + /* > + * Queue up an asynchronous invalidation. The corresponding > + * TLBSYNC is done in arch_tlbbatch_flush(), and must be done > + * on the same CPU. > + */ #if 0 // remove > + if (!batch->used_invlpgb) { > + batch->used_invlpgb = true; > + migrate_disable(); > + } #endif batch->used_invlpg = true; preempt_disable(); > + invlpgb_flush_user_nr_nosync(kern_pcid(asid), uaddr, 1, false); > + /* Do any CPUs supporting INVLPGB need PTI? */ > + if (static_cpu_has(X86_FEATURE_PTI)) > + invlpgb_flush_user_nr_nosync(user_pcid(asid), uaddr, 1, false); this_cpu_write(cpu_tlbstate.pending_tlbsync, true); preempt_enable(); > + > + /* > + * Some CPUs might still be using a local ASID for this > + * process, and require IPIs, while others are using the > + * global ASID. > + * > + * In this corner case we need to do both the broadcast > + * TLB invalidation, and send IPIs. The IPIs will help > + * stragglers transition to the broadcast ASID. > + */ > + if (READ_ONCE(mm->context.asid_transition)) > + goto also_send_ipi; > + } else { > +also_send_ipi: > + inc_mm_tlb_gen(mm); > + cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); > + } > + mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL); > +} > + Then in switch_mm_irqs_off(), b if (this_cpu_read(cpu_tlbstate.pending_tlbsync)) tlbsync(); Note that when switch_mm_irqs_off() is called due to context switch from context_switch(), finish_task_switch() has still not took place, so the task cannot be scheduled on other cores.