On Thu, Oct 27, 2022 at 1:15 PM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > > I think it might be easier to come up with new rules instead of phrasing the > existing ones. I'm ok with that, but I think you are missing a very important issue: all the cases where we can short-circuit TLB invalidations *entirely*. You don't mention those at all. Those optimizations are *very* important. Process exit is one of the most performance-critical pieces of code in the kernel on some loads, because a lot of traditional unix loads have a *ton* of small fork/exec/exit sequences, and the whole "do just one TLB flush" was at least historically quite a big deal. So one very big issue here is when zap_page_tables() can end up skipping TLB flushes entirely, because nobody cares. And no, the fix is not to turn it into some "just increment a generation number". We want to avoid *even that* cost for the whole "we don't actually need a TLB flush at all, until we actually free the pages". So there are two levels of tlb flush optimizations (a) avoiding them entirely in the first place (b) the whole "once you have to flush, keep track of lazy modes and TLB generations, and flush ranges" And honestly, I think you ignored (a), and that's where we do exactly those kinds of "this case doesn't need to flush AT ALL" things. So when you say > The thing I like about this scheme > the most is that it avoids relying on almost all the OS data-structures > (e.g., PageAnon()), making it much easier to grasp. I think it's because you've ignored a big part of the whole issue. Linus