Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 21, 2020 at 11:55:02AM -0800, Linus Torvalds wrote:
> On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
> >
> > Nadav Amit found memory corruptions when running userfaultfd test above.
> > It seems to me the problem is related to commit 09854ba94c6a ("mm:
> > do_wp_page() simplification"). Can you please take a look? Thanks.
> >
> > TL;DR: it may not safe to make copies of singly mapped (non-COW) pages
> > when it's locked or has additional ref count because concurrent
> > clear_soft_dirty or change_pte_range may have removed pte_write but yet
> > to flush tlb.
> 
> Hmm. The TLB flush shouldn't actually matter, because anything that
> changes the writable bit had better be serialized by the page table
> lock.

Well, unfortunately we have places that use optimizations like

  inc_tlb_flush_pending()
    lock page table
      pte_wrprotect
  flush_tlb_range()
  dec_tlb_flush_pending()

which complicate things. And usually checking mm_tlb_flush_pending()
in addition to pte_write() (while holding page table lock) would fix
the similar problems. But for this one, doing so apparently isn't as
straightforward or the best solution.

> Yes, we often load the page table value without holding the page table
> lock (in order to know what we are going to do), but then before we
> finalize the operation, we then re-check - undet the page table lock -
> that the value we loaded still matches.
> 
> But I think I see what *MAY* be going on.  The userfaultfd
> mwriteprotect_range() code takes the mm lock for _reading_. Which
> means that you can have
> 
> Thread A     Thread B
> 
>  - fault starts. Sees write-protected pte, allocates memory, copies data
> 
>                    - userfaultfd makes the regions writable
> 
>                    - usefaultfd case writes to the region
> 
>                    - userfaultfd makes region non-writable
> 
>  - fault continues, gets the page table lock, sees that the pte is the
> same, uses old copied data
> 
> But if this is what's happening, I think it's a userfaultfd bug. I
> think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be
> a mmap_write_lock().
> 
> mprotect() does this right, it looks like userfaultfd does not. You
> cannot just change the writability of a page willy-nilly without the
> correct locking.
> 
> Maybe there are other causes, but this one stands out to me as one
> possible cause.
> 
> Comments?
> 
>               Linus



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux