On Tue, Jun 28, 2022 at 08:30:46PM +0000, Nadav Amit wrote: > > > > On Jun 28, 2022, at 12:15 PM, Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > ⚠ External Email > > > > On Mon, Jun 27, 2022 at 11:37:32PM +0000, Nadav Amit wrote: > >> > >> Besides the benefit of avoiding a TLB flush, there is also the benefit > >> of having more precise dirty tracking. You assume UFFDIO_CONTINUE will be > >> preceded by memory write to the shared memory, but that does not have to > >> be the case. Similarly, if in the future userfaultfd would also support > >> memory-backed private mappings, that does not have to be the case either. > > > > Could I ask what's the not supported private mapping you're talking about > > (and I think you meant "file-backed")? I thought all kinds of private > > mappings are supported on all modes already? > > Yes, I meant file-backed. See vma_can_userfault() for the supported > types of memory: > > static inline bool vma_can_userfault(struct vm_area_struct *vma, > unsigned long vm_flags) > { > if (vm_flags & VM_UFFD_MINOR) > return is_vm_hugetlb_page(vma) || vma_is_shmem(vma); > > #ifndef CONFIG_PTE_MARKER_UFFD_WP > /* > * If user requested uffd-wp but not enabled pte markers for > * uffd-wp, then shmem & hugetlbfs are not supported but only > * anonymous. > */ > if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) > return false; > #endif > return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || > vma_is_shmem(vma); > } I'm confused. Let me ask in another way: do you mean !VM_SHARED when you say "private"? Both shmem & hugetlb support private mappings for all three modes, afaict. > > Well I never thought anyone would be using soft-dirty with uffd because > > logically uffd-wp was somehow trying to replace it with a better interface, > > hopefully. > > I have heard about some who does (not me). So I do not make it up, > unfortunately. If in any form you can get the reason of using soft-dirty in that use case please kindly share more. I think it could be that uffd-wp is not working as good as soft-dirty in some way but we can think about it when there's a valid use case, so ultimately it's more possible for a full replacement. Sync messages can be a problem and they're indeed slow, soft-dirty is by nature async. But still I'd like to check in case you know. > > > > > If to talk about it, IMHO the most important thing is really not the dirty > > bit but the write bit: even if you leave both the dirty bits cleared (so > > the user specified !WRITE_HINT then we grant write bit only), it means when > > the write happens for sure dirty bit will be set on demand but not > > soft-dirty since we won't generate a page fault at all. AFAICT it'll be a > > false negative. > > > > The old code is safe in that respect on that we always set dirty so we can > > only have false positive (which is acceptable in this case) but not false > > negative (which is not). > > I agree that the PTE should be left RO in this case. I was just pointing > that the user might not be aware of soft-dirty being used, and then it > makes sense to treat the (lack of) write-hint appropriately. Yes please. Thanks for raising this topic btw, I never thought about that side effect. [...] > > Another option is we call can_change_pte_writable() somehow in the > > unprotect branch. > > I don’t think that I got this one. No worry, feel free to ignore it. It's probably not ideal anyway because we need an extra pte_mkwrite() there too when can_change_pte_writable() returns true. > tl;dr: I’ll run some numbers (access when PTE-dirty/clear), address the > aforementioned issues and send v2. Based on the results we can decide > whether it makes sense. That'll be very appreciated. Thanks a lot! -- Peter Xu