> On Oct 7, 2021, at 5:13 AM, David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 25.09.21 22:54, Nadav Amit wrote: >> From: Nadav Amit <namit@xxxxxxxxxx> >> Currently, using mprotect() to unprotect a memory region or uffd to >> unprotect a memory region causes a TLB flush. At least on x86, as >> protection is promoted, no TLB flush is needed. >> Add an arch-specific pte_may_need_flush() which tells whether a TLB >> flush is needed based on the old PTE and the new one. Implement an x86 >> pte_may_need_flush(). >> For x86, PTE protection promotion or changes of software bits does >> require a flush, also add logic that considers the dirty-bit. Changes to >> the access-bit do not trigger a TLB flush, although architecturally they >> should, as Linux considers the access-bit as a hint. > > Is the added LOC worth the benefit? IOW, do we have some benchmark that really benefits from that? So you ask whether the added ~10 LOC (net) worth the benefit? Let’s start with the cost of this patch. If you ask about complexity, I think that it is a rather simple patch and documented as needed. Please be more concrete if you think otherwise. If you ask about the runtime overhead, my experience is that such code, which mostly does bit operations, has negligible cost. The execution time of mprotect code, and other similar pieces of code, is mostly dominated by walking the page-tables & getting the pages (which might require cold or random memory accesses), acquiring the locks, and of course the TLB flushes that this patch tries to eliminate. As for the benefit: TLB flush on x86 of a single PTE has an overhead of ~200 cycles. If a TLB shootdown is needed, for instance on multithreaded applications, this overhead can grow to few microseconds or even more, depending on the number of sockets, whether the workload runs in a VM (and worse if CPUs are overcommitted) and so on. This overhead is completely unnecessary on many occasions. If you run mprotect() to add permissions, or as I noted in my case, to do something similar using userfaultfd. Note that the potentially unnecessary TLB flush/shootdown takes place while you hold the mmap-lock for write in the case of mprotect(), thereby potentially preventing other threads from making progress during that time. On my in-development workload it was a considerable overhead (I didn’t collect numbers though). Basically, I track dirty pages using uffd, and every page-fault that can be easily resolved by unprotecting cause a TLB flush/shootdown. If you want, I will write a microbenchmarks and give you numbers. If you look for further optimizations (although you did not indicate so), such as doing the TLB batching from do_mprotect_key(), (i.e. batching across VMAs), we can discuss it and apply it on top of these patches.