On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote: > > On Dec 23, 2020, at 6:00 PM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > > > On Wed, Dec 23, 2020 at 05:21:43PM -0800, Andy Lutomirski wrote: > >> I don’t love this as a long term fix. AFAICT we can have mm_tlb_flush_pending set for quite a while — mprotect seems like it can wait in IO while splitting a huge page, for example. That gives us a window in which every write fault turns into a TLB flush. > > > > mprotect can't run concurrently with a page fault in the first place. > > > > One other near zero cost improvement easy to add if this would be "if > > (vma->vm_flags & (VM_SOFTDIRTY|VM_UFFD_WP))" and it could be made > > conditional to the two config options too. > > > > Still I don't mind doing it in some other way, uffd-wp has much easier > > time doing it in another way in fact. > > > > Whatever performs better is fine, but queuing up pending invalidate > > ranges don't look very attractive since it'd be a fixed cost that we'd > > always have to pay even when there's no fault (and there can't be any > > fault at least for mprotect). > > I think there are other cases in which Andy’s concern is relevant > (MADV_PAGEOUT). That patch only demonstrate a rough idea and I should have been elaborate: if we ever decide to go that direction, we only need to worry about "jumping through hoops", because the final patch (set) I have in mind would not only have the build time optimization Andrea suggested but also include runtime optimizations like skipping do_swap_page() path and (!PageAnon() || page_mapcount > 1). Rest assured, the performance impact on do_wp_page() from occasionally an additional TLB flush on top of a page copy is negligible. > Perhaps holding some small bitmap based on part of the deferred flushed > pages (e.g., bits 12-17 of the address or some other kind of a single > hash-function bloom-filter) would be more performant to avoid (most) > unnecessary TLB flushes. It will be cleared before a TLB flush and set while > holding the PTL. > > Checking if a flush is needed, under the PTL, would require a single memory > access (although potentially cache miss). It will however require one atomic > operation for each page-table whose PTEs’ flushes are deferred - in contrast > to the current scheme which requires two atomic operations for the *entire* > operation. >