Re: [PATCH] mm: Track mmu notifiers in fs_reclaim_acquire/release

Daniel Vetter <daniel@xxxxxxxx> · Sun, 21 Jun 2020 22:01:03 +0200

On Sun, Jun 21, 2020 at 08:07:08PM +0200, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 7:42 PM Qian Cai <cai@xxxxxx> wrote:
> >
> > On Wed, Jun 10, 2020 at 09:41:01PM +0200, Daniel Vetter wrote:
> > > fs_reclaim_acquire/release nicely catch recursion issues when
> > > allocating GFP_KERNEL memory against shrinkers (which gpu drivers tend
> > > to use to keep the excessive caches in check). For mmu notifier
> > > recursions we do have lockdep annotations since 23b68395c7c7
> > > ("mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end").
> > >
> > > But these only fire if a path actually results in some pte
> > > invalidation - for most small allocations that's very rarely the case.
> > > The other trouble is that pte invalidation can happen any time when
> > > __GFP_RECLAIM is set. Which means only really GFP_ATOMIC is a safe
> > > choice, GFP_NOIO isn't good enough to avoid potential mmu notifier
> > > recursion.
> > >
> > > I was pondering whether we should just do the general annotation, but
> > > there's always the risk for false positives. Plus I'm assuming that
> > > the core fs and io code is a lot better reviewed and tested than
> > > random mmu notifier code in drivers. Hence why I decide to only
> > > annotate for that specific case.
> > >
> > > Furthermore even if we'd create a lockdep map for direct reclaim, we'd
> > > still need to explicit pull in the mmu notifier map - there's a lot
> > > more places that do pte invalidation than just direct reclaim, these
> > > two contexts arent the same.
> > >
> > > Note that the mmu notifiers needing their own independent lockdep map
> > > is also the reason we can't hold them from fs_reclaim_acquire to
> > > fs_reclaim_release - it would nest with the acquistion in the pte
> > > invalidation code, causing a lockdep splat. And we can't remove the
> > > annotations from pte invalidation and all the other places since
> > > they're called from many other places than page reclaim. Hence we can
> > > only do the equivalent of might_lock, but on the raw lockdep map.
> > >
> > > With this we can also remove the lockdep priming added in 66204f1d2d1b
> > > ("mm/mmu_notifiers: prime lockdep") since the new annotations are
> > > strictly more powerful.
> > >
> > > v2: Review from Thomas Hellstrom:
> > > - unbotch the fs_reclaim context check, I accidentally inverted it,
> > >   but it didn't blow up because I inverted it immediately
> > > - fix compiling for !CONFIG_MMU_NOTIFIER
> > >
> > > Cc: Thomas Hellström (Intel) <thomas_os@xxxxxxxxxxxx>
> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> > > Cc: linux-mm@xxxxxxxxx
> > > Cc: linux-rdma@xxxxxxxxxxxxxxx
> > > Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
> > > Cc: Christian König <christian.koenig@xxxxxxx>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>
> >
> > Replying the right patch here...
> >
> > Reverting this commit [1] fixed the lockdep warning below while applying
> > some memory pressure.
> >
> > [1] linux-next cbf7c9d86d75 ("mm: track mmu notifiers in fs_reclaim_acquire/release")
> 
> Hm, then I'm confused because
> - there's not mmut notifier lockdep map in the splat at a..
> - the patch is supposed to not change anything for fs_reclaim (but the
> interim version got that wrong)
> - looking at the paths it's kmalloc vs kswapd, both places I totally
> expect fs_reflaim to be used.
> 
> But you're claiming reverting this prevents the lockdep splat. If
> that's right, then my reasoning above is broken somewhere. Someone
> less blind than me having an idea?
> 
> Aside this is the first email I've typed, until I realized the first
> report was against the broken patch and that looked like a much more
> reasonable explanation (but didn't quite match up with the code
> paths).

Below diff should undo the functional change in my patch. Can you pls test
whether the lockdep splat is really gone with that? Might need a lot of
testing and memory pressure to be sure, since all these reclaim paths
aren't very deterministic.
-Daniel

---

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d807587c9ae6..27ea763c6155 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4191,11 +4191,6 @@ void fs_reclaim_acquire(gfp_t gfp_mask)
 		if (gfp_mask & __GFP_FS)
 			__fs_reclaim_acquire();
 
-#ifdef CONFIG_MMU_NOTIFIER
-		lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
-		lock_map_release(&__mmu_notifier_invalidate_range_start_map);
-#endif
-
 	}
 }
 EXPORT_SYMBOL_GPL(fs_reclaim_acquire);
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch