On Thu, Aug 22, 2019 at 11:10:57AM +0200, Peter Zijlstra wrote: > On Thu, Aug 22, 2019 at 10:51:30AM +0200, Peter Zijlstra wrote: > > On Thu, Aug 22, 2019 at 12:59:48AM -0700, Christoph Hellwig wrote: > > > On Thu, Aug 22, 2019 at 10:31:32AM +1000, Dave Chinner wrote: > > > > > Btw, I think we should eventually kill off KM_NOFS and just use > > > > > PF_MEMALLOC_NOFS in XFS, as the interface makes so much more sense. > > > > > But that's something for the future. > > > > > > > > Yeah, and it's not quite as simple as just using PF_MEMALLOC_NOFS > > > > at high levels - we'll still need to annotate callers that use KM_NOFS > > > > to avoid lockdep false positives. i.e. any code that can be called from > > > > GFP_KERNEL and reclaim context will throw false positives from > > > > lockdep if we don't annotate tehm correctly.... > > > > > > Oh well. For now we have the XFS kmem_wrappers to turn that into > > > GFP_NOFS so we shouldn't be too worried, but I think that is something > > > we should fix in lockdep to ensure it is generally useful. I've added > > > the maintainers and relevant lists to kick off a discussion. > > > > Strictly speaking the fs_reclaim annotation is no longer part of the > > lockdep core, but is simply a fake lock in page_alloc.c and thus falls > > under the mm people's purview. > > > > That said; it should be fairly straight forward to teach > > __need_fs_reclaim() about PF_MEMALLOC_NOFS, much like how it already > > knows about PF_MEMALLOC. > > Ah, current_gfp_context() already seems to transfer PF_MEMALLOC_NOFS > into the GFP flags. > > So are we sure it is broken and needs mending? Well, that's what we are trying to work out. The problem is that we have code that takes locks and does allocations that is called both above and below the reclaim "lock" context. Once it's been seen below the reclaim lock context, calling it with GFP_KERNEL context above the reclaim lock context throws a deadlock warning. The only way around that was to mark these allocation sites as GFP_NOFS so lockdep is never allowed to see that recursion through reclaim occur. Even though it isn't a deadlock vector. What we're looking at is whether PF_MEMALLOC_NOFS changes this - I don't think it does solve this problem. i.e. if we define the allocation as GFP_KERNEL and then use PF_MEMALLOC_NOFS where reclaim is not allowed, we still have GFP_KERNEL allocations in code above reclaim that has also been seen below relcaim. And so we'll get false positive warnings again. What I think we are going to have to do here is manually audit each of the KM_NOFS call sites as we remove the NOFS from them and determine if ___GFP_NOLOCKDEP is needed to stop lockdep from trying to track these allocation sites. We've never used this tag because we'd already fixed most of these false positives with explicit GFP_NOFS tags long before ___GFP_NOLOCKDEP was created. But until someone starts doing the work, I don't know if it will work or even whether conversion PF_MEMALLOC_NOFS is going to introduce a bunch of new ways to get false positives from lockdep... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx