> On May 16, 2023, at 7:38 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > There is a world outside of x86, but even on x86 it's borderline silly > to take the whole TLB out when you can flush 3 TLB entries one by one > with exactly the same number of IPIs, i.e. _one_. No? I just want to re-raise points that were made in the past, including in the discussion that I sent before and match my experience. Feel free to reject them, but I think you should not ignore them. In a nutshell, there is a tradeoff which is non-trivial. Controlling the exact ranges that need to be flushed might require, especially in IPI-based TLB invalidation systems, additional logic and more cache lines that need to traverse between the caches. The latter - the cache-lines that hold the ranges that need to be flushed - are the main issue. They might induce overhead that negates the benefits if in most cases it turns out that many pages are flushed. Data structures such as linked-lists might therefore not be suitable to hold the ranges that need to be flushed, as they are not cache-friendly. The data that is transferred between the cores to indicate which ranges should be flushed would ideally be cache line aligned and fit into a single cache-line. It is possible that for kernel ranges, where the stride is always a base-page size (4KB on x86) you might come with more condense way of communicating TLB flushing ranges of kernel pages than userspace pages. Perhaps the workload characteristics are different. But it should be noticed that major parts of the rationale behind the changes that you suggest could also apply to TLB invalidations of userspace mapping, as done in tlb_gather and UBC mechanisms. But in those cases the rationale, at least for x86, was that since the CPU knows to do TLB refills very efficiently, the extra complexity and overheads are likely not to worth the trouble. I hope my feedback is useful. Here is again a link to a discussion from 2015 about this subject: https://lore.kernel.org/all/CA+55aFwVUkdaf0_rBk7uJHQjWXu+OcLTHc6FKuCn0Cb2Kvg9NA@xxxxxxxxxxxxxx/ There are several patches that showed the benefit of reducing cache contention during TLB shootdown. Here is one for example: https://lore.kernel.org/all/20190423065706.15430-1-namit@xxxxxxxxxx/