On Tue, Jan 05, 2021 at 10:08:13AM -0500, Peter Xu wrote: > On Fri, Dec 25, 2020 at 01:25:28AM -0800, Nadav Amit wrote: > > diff --git a/mm/mprotect.c b/mm/mprotect.c > > index ab709023e9aa..c08c4055b051 100644 > > --- a/mm/mprotect.c > > +++ b/mm/mprotect.c > > @@ -75,7 +75,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > > oldpte = *pte; > > if (pte_present(oldpte)) { > > pte_t ptent; > > - bool preserve_write = prot_numa && pte_write(oldpte); > > + bool preserve_write = (prot_numa || uffd_wp_resolve) && > > + pte_write(oldpte); > > Irrelevant of the other tlb issue, this is a standalone one and I commented in > v1 about simply ignore the change if necessary; unluckily that seems to be > ignored.. so I'll try again - would below be slightly better? > > if (uffd_wp_resolve && !pte_uffd_wp(oldpte)) > continue; I posted the exact same code before seeing the above so I take it as a good sign :). I'd suggest to add the reverse check to the uffd_wp too. > Firstly, current patch is confusing at least to me, because "uffd_wp_resolve" > means "unprotect the pte", whose write bit should mostly be cleared already > when uffd_wp_resolve is applicable. Then "preserve_write" for that pte looks > odd already. > > Meanwhile, if that really happens (when pte write bit set, but during a > uffd_wp_resolve request) imho there is really nothing we can do, so we should > simply avoid touching that at all, and also avoid ptep_modify_prot_start, > pte_modify, ptep_modify_prot_commit, calls etc., which takes extra cost. Agreed. It should not just defer the flush, by doing continue we will not flush anything. So ultimately the above will be an orthogonal optimization, but now I get the why the deferred tlb flush on the cpu0 of the v2 patch was the problematic one. I didn't see we lacked the above optimization and I thought we were discussing still the regular case where un-wrprotect is called on a pte with uffd-wp set. thanks, Andrea