On Fri, 4 Jun 2021, Peter Xu wrote: > On Thu, Jun 03, 2021 at 07:54:11PM -0700, Hugh Dickins wrote: > > On Thu, 3 Jun 2021, Peter Xu wrote: > > > On Tue, Jun 01, 2021 at 02:07:53PM -0700, Hugh Dickins wrote: > > > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > > > > index 2cf01d933f13..b45d22738b45 100644 > > > > --- a/mm/page_vma_mapped.c > > > > +++ b/mm/page_vma_mapped.c > > > > @@ -212,6 +212,14 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > > > > pvmw->ptl = NULL; > > > > } > > > > } else if (!pmd_present(pmde)) { > > > > + /* > > > > + * If PVMW_SYNC, take and drop THP pmd lock so that we > > > > + * cannot return prematurely, while zap_huge_pmd() has > > > > + * cleared *pmd but not decremented compound_mapcount(). > > > > + */ > > > > + if ((pvmw->flags & PVMW_SYNC) && > > > > + PageTransCompound(pvmw->page)) > > > > + spin_unlock(pmd_lock(mm, pvmw->pmd)); > > > > return false; > > > > } > > > > if (!map_pte(pvmw)) > > > > > > Sorry if I missed something important, but I'm totally confused on how this > > > unlock is pairing with another lock().. > > > > I imagine you're reading that as spin_unlock(pmd_lockptr(blah)); > > no, the lock is right there, inside spin_unlock(pmd_lock(blah)). > > Heh, yeah... Sorry about that. I'll expand that line, as Kirill asks too. > > > > > > > > > And.. isn't PVMW_SYNC only meaningful for pte-level only (as I didn't see a > > > reference of it outside map_pte)? > > > > But you are pointing directly to its reference outside map_pte()! > > Right, I was trying to look for the lock() so I needed to look at all the rest > besides this one. :) > > I didn't follow Yang's patch, but if Yang's patch can make kernel not crashing > and fault handling done all well, then I'm kind of agree with him: having > workaround code (like taking lock and quickly releasing..) only for debug code > seems an overkill to me, not to mention that the debug code will be even more > strict after this patch, as it means it's even less likely that one can > reproduce one production host race with DEBUG_VM.. Normally debugging code > would affect code execution already, and for this one we're enlarging that gap > "explicitly" - not sure whether it's good. > > This also makes me curious what if we make !DEBUG_VM strict too - how much perf > we'll lose? Haven't even tried to think about it with details, but just raise > it up. Say, is there any chance we make the debug/non-debug paths run the same > logic (e.g. of SYNC version)? And Yang Shi suggests the same. Yes, I'm not fond of doing that differently for DEBUG_VM or not; but could never quite decide which way to jump. For so long as we worry about whether split_huge_page() is working correctly (and Wang Yugui still has a case that we have not solved), we do want the warning; and for so long as we have the warning, we do need the TTU_SYNC to prevent showing the warning unnecessarily. How much overhead added by doing TTU_SYNC now on !DEBUG_VM? On any sensible anon THP case, I don't think it could add overhead at all. But in some shmem cases (multiply mapped but sparsely populated, populated differently by different tasks) it could add overhead: dirtying lock cachelines in tasks which don't have the page mapped. But we're only talking about huge page splitting, that should not be #1 on anyone's performance list; and has all sorts of other overheads of its own. I think I'll go with your preference, and make this TTU_SYNC for all. We can easily revert to DEBUG_VM only if some regression is noticed. Hugh