On Mon, 2025-01-06 at 14:04 +0100, Jann Horn wrote: > On Sat, Jan 4, 2025 at 3:55 AM Rik van Riel <riel@xxxxxxxxxxx> wrote: > > > > > Then the only change needed to switch_mm_irqs_off > > would be to move the LOADED_MM_SWITCHING line to > > before choose_new_asid, to fully close the window. > > > > Am I overlooking anything here? > > I think that might require having a full memory barrier in > switch_mm_irqs_off to ensure that the write of LOADED_MM_SWITCHING > can't be reordered after reads in choose_new_asid(). Which wouldn't > be > very nice; we probably should avoid adding heavy barriers to the task > switch path... > > Hmm, but I think luckily the cpumask_set_cpu() already implies a > relaxed RMW atomic, which I think on X86 is actually the same as a > sequentially consistent atomic, so as long as you put the > LOADED_MM_SWITCHING line before that, it might do the job? Maybe with > an smp_mb__after_atomic() and/or an explainer comment. > (smp_mb__after_atomic() is a no-op on x86, so maybe just a comment is > the right way. Documentation/memory-barriers.txt says > smp_mb__after_atomic() can be used together with atomic RMW bitop > functions.) > That noop smp_mb__after_atomic() might be the way to go, since we do not actually use the mm_cpumask with INVLPGB, and we could conceivably skip updates to the bitmask for tasks using broadcast TLB flushing. > > > > > > > I'll add the READ_ONCE. > > > > Will the race still exist if we wait on > > LOADED_MM_SWITCHING as proposed above? > > I think so, since between reading the loaded_mm and reading the > loaded_mm_asid, the remote CPU might go through an entire task > switch. > Like: > > 1. We read the loaded_mm, and see that the remote CPU is currently > running in our mm_struct. > 2. The remote CPU does a task switch to another process with a > different mm_struct. > 3. We read the loaded_mm_asid, and see an ASID that does not match > our > broadcast ASID (because the loaded ASID is not for our mm_struct). > A false positive, where we do not clear the asid_transition field, and will check again in the future should be harmless, though. The worry is false negatives, where we fail to detect an out-of-sync CPU, yet still clear the asid_transition field. -- All Rights Reversed.