Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

Yu Zhao <yuzhao@xxxxxxxxxx> · Mon, 21 Dec 2020 12:16:21 -0700

On Mon, Dec 21, 2020 at 10:31:57AM -0800, Nadav Amit wrote:
> > On Dec 21, 2020, at 9:27 AM, Peter Xu <peterx@xxxxxxxxxx> wrote:
> > 
> > Hi, Nadav,
> > 
> > On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
> > 
> > [...]
> > 
> >> So to correct myself, I think that what I really encountered was actually
> >> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The
> >> problem was that in this case the “write”-bit was removed during unprotect.
> >> Sorry for the strange formatting to fit within 80 columns:
> > 
> > I assume I can ignore the race mentioned in the commit message but only refer
> > to this one below.  However I'm still confused.  Please see below.
> > 
> >> [ Start: PTE is writable ]
> >> 
> >> cpu0				cpu1			cpu2
> >> ----				----			----
> >> 							[ Writable PTE 
> >> 							  cached in TLB ]
> > 
> > Here cpu2 got writable pte in tlb.  But why?
> > 
> > If below is an unprotect, it means it must have been protected once by
> > userfaultfd, right?  If so, the previous change_protection_range() which did
> > the wr-protect should have done a tlb flush already before it returns (since
> > pages>0 - we protected one pte at least).  Then I can't see why cpu2 tlb has
> > stall data.
> 
> Thanks, Peter. Just as you can munprotect() a region which was not protected
> before, you can ufff-unprotect a region that was not protected before. It
> might be that the user tried to unprotect a large region, which was
> partially protected and partially unprotected.
> 
> The selftest obviously blindly unprotect some regions to check for bugs.
> 
> So to your question - it was not write-protected (think about initial copy
> without write-protecting).
> 
> > If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't
> > happen either, because cpu1/cpu2 will all go through the cow path and pgtable
> > lock should serialize them.
> > 
> >> userfaultfd_writeprotect()				
> >> [ write-*unprotect* ]
> >> mwriteprotect_range()
> >> mmap_read_lock()
> >> change_protection()
> >> 
> >> change_protection_range()
> >> ...
> >> change_pte_range()
> >> [ *clear* “write”-bit ]
> >> [ defer TLB flushes]
> >> 				[ page-fault ]
> >> 				…
> >> 				wp_page_copy()
> >> 				 cow_user_page()
> >> 				  [ copy page ]
> >> 							[ write to old
> >> 							  page ]
> >> 				…
> >> 				 set_pte_at_notify()
> >> 
> >> [ End: cpu2 write not copied form old to new page. ]
> > 
> > Could you share how to reproduce the problem?  I would be glad to give it a
> > shot as well.
> 
> You can run the selftests/userfaultfd with my small patch [1]. I ran it with
> the following parameters: “ ./userfaultfd anon 100 100 “. I think that it is
> more easily reproducible with “mitigations=off idle=poll” as kernel
> parameters.
> 
> [1] https://lore.kernel.org/patchwork/patch/1346386/

Hi Linus,

Nadav Amit found memory corruptions when running userfaultfd test above.
It seems to me the problem is related to commit 09854ba94c6a ("mm:
do_wp_page() simplification"). Can you please take a look? Thanks.

TL;DR: it may not safe to make copies of singly mapped (non-COW) pages
when it's locked or has additional ref count because concurrent
clear_soft_dirty or change_pte_range may have removed pte_write but yet
to flush tlb.