On Wed, 2025-01-22 at 09:38 +0100, Peter Zijlstra wrote: > > Looking at this more... I'm left wondering, did 'we' look at any > other > architecture code at all? > > For example, look at arch/arm64/mm/context.c and see how their reset > works. Notably, they are not at all limited to reclaiming free'd > ASIDs, > but will very aggressively take back all ASIDs except for the current > running ones. > I did look at the ARM64 code, and while their reset is much nicer, it looks like that comes at a cost on each process at context switch time. In new_context(), there is a call to check_update_reserved_asid(), which will iterate over all CPUs to check whether this process's ASID is part of the reserved list that got carried over during the rollover. I don't know if that would scale well enough to work on systems with thousands of CPUs. > If we want to move towards relying on broadcast TBLI, we'll need to > go in that direction. For single threaded processes, which are still very common, a local flush would likely be faster than broadcast flushes, even if multiple broadcast flushes can be pending simultaneously. For very large systems with a large number of processes, I agree we want to move in that direction, but we may need to figure out whether or not everybody taking the cpu_asid_lock at rollover time, and then scanning all other CPUs from check_update_reserved_asid(), with the lock held, would scale to systems with thousands of CPUs. Everybody taking the cpu_asid_lock would probably be fine, if they didn't all have to scan over all the CPUs. If we can figure out a more scalable way to do the new_context() stuff, this would definitely be the way to go. -- All Rights Reversed.