On Sun, Jan 9, 2022 at 12:49 AM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > > I do not know whether it is a pure win, but there is a tradeoff. Hmm. I guess only some serious testing would tell. On x86, I'd be a bit worried about removing lazy TLB simply because even with ASID support there (called PCIDs by Intel for NIH reasons), the actual ASID space on x86 was at least originally very very limited. Architecturally, x86 may expose 12 bits of ASID space, but iirc at least the first few implementations actually only internally had one or two bits, and hashed the 12 bits down to that internal very limited hardware TLB ID space. We only use a handful of ASIDs per CPU on x86 partly for this reason (but also since there's no remote hardware TLB shootdown, there's no reason to have a bigger global ASID space, so ASIDs aren't _that_ common). And I don't know how many non-PCID x86 systems (perhaps virtualized?) there might be out there. But it would be very interesting to test some "disable lazy tlb" patch. The main problem workloads tend to be IO, and I'm not sure how many of the automated performance tests would catch issues. I guess some threaded pipe ping-pong test (with each thread pinned to different cores) would show it. And I guess there is some load that triggered the original powerpc patch by Nick&co, and that Andy has been using.. Anybody willing to cook up a patch and run some benchmarks? Perhaps one that basically just replaces "set ->mm to NULL" with "set ->mm to &init_mm" - so that the lazy TLB code is still *there*, but it never triggers.. I think it's mainly 'copy_thread()' in kernel/fork.c and the 'init_mm' initializer in mm/init-mm.c, but there's probably other things too that have that knowledge of the special "tsk->mm = NULL" situation. Linus