On Mon, Dec 21, 2020 at 11:55:02AM -0800, Linus Torvalds wrote: > On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > > > Nadav Amit found memory corruptions when running userfaultfd test above. > > It seems to me the problem is related to commit 09854ba94c6a ("mm: > > do_wp_page() simplification"). Can you please take a look? Thanks. > > > > TL;DR: it may not safe to make copies of singly mapped (non-COW) pages > > when it's locked or has additional ref count because concurrent > > clear_soft_dirty or change_pte_range may have removed pte_write but yet > > to flush tlb. > > Hmm. The TLB flush shouldn't actually matter, because anything that > changes the writable bit had better be serialized by the page table > lock. Well, unfortunately we have places that use optimizations like inc_tlb_flush_pending() lock page table pte_wrprotect flush_tlb_range() dec_tlb_flush_pending() which complicate things. And usually checking mm_tlb_flush_pending() in addition to pte_write() (while holding page table lock) would fix the similar problems. But for this one, doing so apparently isn't as straightforward or the best solution. > Yes, we often load the page table value without holding the page table > lock (in order to know what we are going to do), but then before we > finalize the operation, we then re-check - undet the page table lock - > that the value we loaded still matches. > > But I think I see what *MAY* be going on. The userfaultfd > mwriteprotect_range() code takes the mm lock for _reading_. Which > means that you can have > > Thread A Thread B > > - fault starts. Sees write-protected pte, allocates memory, copies data > > - userfaultfd makes the regions writable > > - usefaultfd case writes to the region > > - userfaultfd makes region non-writable > > - fault continues, gets the page table lock, sees that the pte is the > same, uses old copied data > > But if this is what's happening, I think it's a userfaultfd bug. I > think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be > a mmap_write_lock(). > > mprotect() does this right, it looks like userfaultfd does not. You > cannot just change the writability of a page willy-nilly without the > correct locking. > > Maybe there are other causes, but this one stands out to me as one > possible cause. > > Comments? > > Linus