On Sun, Jun 21, 2020 at 10:01:03PM +0200, Daniel Vetter wrote: > On Sun, Jun 21, 2020 at 08:07:08PM +0200, Daniel Vetter wrote: > > On Sun, Jun 21, 2020 at 7:42 PM Qian Cai <cai@xxxxxx> wrote: > > > > > > On Wed, Jun 10, 2020 at 09:41:01PM +0200, Daniel Vetter wrote: > > > > fs_reclaim_acquire/release nicely catch recursion issues when > > > > allocating GFP_KERNEL memory against shrinkers (which gpu drivers tend > > > > to use to keep the excessive caches in check). For mmu notifier > > > > recursions we do have lockdep annotations since 23b68395c7c7 > > > > ("mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end"). > > > > > > > > But these only fire if a path actually results in some pte > > > > invalidation - for most small allocations that's very rarely the case. > > > > The other trouble is that pte invalidation can happen any time when > > > > __GFP_RECLAIM is set. Which means only really GFP_ATOMIC is a safe > > > > choice, GFP_NOIO isn't good enough to avoid potential mmu notifier > > > > recursion. > > > > > > > > I was pondering whether we should just do the general annotation, but > > > > there's always the risk for false positives. Plus I'm assuming that > > > > the core fs and io code is a lot better reviewed and tested than > > > > random mmu notifier code in drivers. Hence why I decide to only > > > > annotate for that specific case. > > > > > > > > Furthermore even if we'd create a lockdep map for direct reclaim, we'd > > > > still need to explicit pull in the mmu notifier map - there's a lot > > > > more places that do pte invalidation than just direct reclaim, these > > > > two contexts arent the same. > > > > > > > > Note that the mmu notifiers needing their own independent lockdep map > > > > is also the reason we can't hold them from fs_reclaim_acquire to > > > > fs_reclaim_release - it would nest with the acquistion in the pte > > > > invalidation code, causing a lockdep splat. And we can't remove the > > > > annotations from pte invalidation and all the other places since > > > > they're called from many other places than page reclaim. Hence we can > > > > only do the equivalent of might_lock, but on the raw lockdep map. > > > > > > > > With this we can also remove the lockdep priming added in 66204f1d2d1b > > > > ("mm/mmu_notifiers: prime lockdep") since the new annotations are > > > > strictly more powerful. > > > > > > > > v2: Review from Thomas Hellstrom: > > > > - unbotch the fs_reclaim context check, I accidentally inverted it, > > > > but it didn't blow up because I inverted it immediately > > > > - fix compiling for !CONFIG_MMU_NOTIFIER > > > > > > > > Cc: Thomas Hellström (Intel) <thomas_os@xxxxxxxxxxxx> > > > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > > Cc: linux-mm@xxxxxxxxx > > > > Cc: linux-rdma@xxxxxxxxxxxxxxx > > > > Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> > > > > Cc: Christian König <christian.koenig@xxxxxxx> > > > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > > > > > > Replying the right patch here... > > > > > > Reverting this commit [1] fixed the lockdep warning below while applying > > > some memory pressure. > > > > > > [1] linux-next cbf7c9d86d75 ("mm: track mmu notifiers in fs_reclaim_acquire/release") > > > > Hm, then I'm confused because > > - there's not mmut notifier lockdep map in the splat at a.. > > - the patch is supposed to not change anything for fs_reclaim (but the > > interim version got that wrong) > > - looking at the paths it's kmalloc vs kswapd, both places I totally > > expect fs_reflaim to be used. > > > > But you're claiming reverting this prevents the lockdep splat. If > > that's right, then my reasoning above is broken somewhere. Someone > > less blind than me having an idea? > > > > Aside this is the first email I've typed, until I realized the first > > report was against the broken patch and that looked like a much more > > reasonable explanation (but didn't quite match up with the code > > paths). > > Below diff should undo the functional change in my patch. Can you pls test > whether the lockdep splat is really gone with that? Might need a lot of > testing and memory pressure to be sure, since all these reclaim paths > aren't very deterministic. Well, I am running even heavy memory pressure workloads on linux-next like every day, and never saw this splat until today where your patch first show up. Since I am rather busy tracking another regression, here is the steps to reproduce (super easy to reproduce on multiple machines here.): # git clone https://github.com/cailca/linux-mm.git # cd linux-mm; make # ./random 0 The .config is in there as well if ever matters.