On Mon, Dec 21, 2020 at 10:31:57AM -0800, Nadav Amit wrote: > > On Dec 21, 2020, at 9:27 AM, Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > Hi, Nadav, > > > > On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote: > > > > [...] > > > >> So to correct myself, I think that what I really encountered was actually > >> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The > >> problem was that in this case the “write”-bit was removed during unprotect. > >> Sorry for the strange formatting to fit within 80 columns: > > > > I assume I can ignore the race mentioned in the commit message but only refer > > to this one below. However I'm still confused. Please see below. > > > >> [ Start: PTE is writable ] > >> > >> cpu0 cpu1 cpu2 > >> ---- ---- ---- > >> [ Writable PTE > >> cached in TLB ] > > > > Here cpu2 got writable pte in tlb. But why? > > > > If below is an unprotect, it means it must have been protected once by > > userfaultfd, right? If so, the previous change_protection_range() which did > > the wr-protect should have done a tlb flush already before it returns (since > > pages>0 - we protected one pte at least). Then I can't see why cpu2 tlb has > > stall data. > > Thanks, Peter. Just as you can munprotect() a region which was not protected > before, you can ufff-unprotect a region that was not protected before. It > might be that the user tried to unprotect a large region, which was > partially protected and partially unprotected. > > The selftest obviously blindly unprotect some regions to check for bugs. > > So to your question - it was not write-protected (think about initial copy > without write-protecting). If that's the only case, how about we don't touch the ptes at all? Instead of playing with preserve_write, I'm thinking something like this right before ptep_modify_prot_start(), even for uffd_wp==true: if (uffd_wp && pte_uffd_wp(old_pte)) { WARN_ON_ONCE(pte_write(old_pte)); continue; } if (uffd_wp_resolve && !pte_uffd_wp(old_pte)) continue; Then we can also avoid the heavy operations on changing ptes back and forth. > > > If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't > > happen either, because cpu1/cpu2 will all go through the cow path and pgtable > > lock should serialize them. > > > >> userfaultfd_writeprotect() > >> [ write-*unprotect* ] > >> mwriteprotect_range() > >> mmap_read_lock() > >> change_protection() > >> > >> change_protection_range() > >> ... > >> change_pte_range() > >> [ *clear* “write”-bit ] > >> [ defer TLB flushes] > >> [ page-fault ] > >> … > >> wp_page_copy() > >> cow_user_page() > >> [ copy page ] > >> [ write to old > >> page ] > >> … > >> set_pte_at_notify() > >> > >> [ End: cpu2 write not copied form old to new page. ] > > > > Could you share how to reproduce the problem? I would be glad to give it a > > shot as well. > > You can run the selftests/userfaultfd with my small patch [1]. I ran it with > the following parameters: “ ./userfaultfd anon 100 100 “. I think that it is > more easily reproducible with “mitigations=off idle=poll” as kernel > parameters. > > [1] https://lore.kernel.org/patchwork/patch/1346386/ Thanks. > > > > >> [1] https://lore.kernel.org/patchwork/patch/1346386 > > > > PS: Sorry to not have read the other series of yours. It seems to need some > > chunk of time so I postponed it a bit due to other things; but I'll read at > > least the fixes very soon. > > Thanks again, I will post RFCv2 with some numbers soon. I read the patch 1/3 of the series. Would it be better to post them separately just in case Andrew would like to pick them earlier? Since you seem to be heavily working on uffd-wp - I do still have a few uffd-wp fixes locally even for anonymous. I think they're related to some corner cases like either thp or migration entry convertions, but anyway I'll see whether I should post them even earlier (I planned to add smap/pagemap support for uffd-wp so maybe I can even write some test case to verify some of them). Just a FYI... Thanks, -- Peter Xu