Re: [PATCH v1] mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage

Peter Xu <peterx@xxxxxxxxxx> · Fri, 3 Mar 2023 09:32:56 -0500

On Fri, Mar 03, 2023 at 10:12:26AM +0100, David Hildenbrand wrote:
> On 02.03.23 23:29, Peter Xu wrote:
> > On Thu, Mar 02, 2023 at 06:54:23PM +0100, David Hildenbrand wrote:
> > > Currently, we'd lose the userfaultfd-wp marker when PTE-mapping a huge
> > > zeropage, resulting in the next write faults in the PMD range
> > > not triggering uffd-wp events.
> > > 
> > > Various actions (partial MADV_DONTNEED, partial mremap, partial munmap,
> > > partial mprotect) could trigger this. However, most importantly,
> > > un-protecting a single sub-page from the userfaultfd-wp handler when
> > > processing a uffd-wp event will PTE-map the shared huge zeropage and
> > > lose the uffd-wp bit for the remainder of the PMD.
> > > 
> > > Let's properly propagate the uffd-wp bit to the PMDs.
> > 
> > Ouch.. I thought this was reported once, probably it fell through the
> > cracks.
> 
> Yes, I reported it a while ago, but our understanding back then was that
> primarily MADV_DONTNEED would trigger it (which my reproducer back then
> did), and e.g., QEMU would make sure to not have concurrent MADV_DONTNEED
> while doing background snapshots.
> 
> I realized only yesterday when retesting my patch that that a simple
> unprotect is already sufficient to mess it up.

IIRC the rational was it was fine for the original design because we didn't
plan to protect none ptes, so missing zero ptes would be the same as none,
which was not a problem.

However that'll just make apps like QEMU stops working when they want to
trap none ptes by using a round of pre-read, then the workaround will not
work either.  Copy stable will make the workaround also work for stables.

-- 
Peter Xu