Re: Xfs lockdep warning with for-dave-for-4.6 branch

Michal Hocko <mhocko@xxxxxxxxxx> · Thu, 2 Jun 2016 17:46:19 +0200

On Thu 02-06-16 17:11:16, Peter Zijlstra wrote:
> On Thu, Jun 02, 2016 at 04:50:49PM +0200, Michal Hocko wrote:
> > On Wed 01-06-16 20:16:17, Peter Zijlstra wrote:
> 
> > > So my favourite is the dedicated GFP flag, but if that's unpalatable for
> > > the mm folks then something like the below might work. It should be
> > > similar in effect to your proposal, except its more limited in scope.
> > [...]
> > > @@ -2876,11 +2883,36 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
> > >  	if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)))
> > >  		return;
> > >  
> > > +	/*
> > > +	 * Skip _one_ allocation as per the lockdep_skip_alloc() request.
> > > +	 * Must be done last so that we don't loose the annotation for
> > > +	 * GFP_ATOMIC like things from IRQ or other nesting contexts.
> > > +	 */
> > > +	if (current->lockdep_reclaim_gfp & __GFP_SKIP_ALLOC) {
> > > +		current->lockdep_reclaim_gfp &= ~__GFP_SKIP_ALLOC;
> > > +		return;
> > > +	}
> > > +
> > >  	mark_held_locks(curr, RECLAIM_FS);
> > >  }
> > 
> > I might be missing something but does this work actually? Say you would
> > want a kmalloc(size), it would call
> > slab_alloc_node
> >   slab_pre_alloc_hook
> >     lockdep_trace_alloc
> > [...]
> >   ____cache_alloc_node
> >     cache_grow_begin
> >       kmem_getpages
> >         __alloc_pages_node
> > 	  __alloc_pages_nodemask
> > 	    lockdep_trace_alloc
> 
> Bugger :/ You're right, that would fail.
> 
> So how about doing:
> 
> #define __GFP_NOLOCKDEP	(1u << __GFP_BITS_SHIFT)

Hmm, now that I looked closer this would break GFP_SLAB_BUG_MASK :/
The whole thing is a bit hysterical because I really do not see any
reason to blow up just because somebody has used incorrect gfp mask
(we have users who give us combinations without any sense in the tree...)

We can fix that either by dropping the whole GFP_SLAB_BUG_MASK thingy
or to update it with __GFP_NOLOCKDEP. It just shows how this might get
really tricky and subtle.

> this means it cannot be part of address_space::flags or
> radix_tree_root::gfp_mask, but that might not be a bad thing.

True, those shouldn't really care.

> And this solves the scarcity thing, because per pagemap we need to have
> 5 'spare' bits anyway.
> 
> > I understand your concerns about the scope but usually all allocations
> > have to be __GFP_NOFS or none in the same scope so I would see it as a
> > huge deal.
> 
> With scope I mostly meant the fact that you have two calls that you need
> to pair up. That's not really nice as you can 'annotate' a _lot_ of code
> in between. I prefer the narrower annotations where you annotate a
> single specific site.

Yes, I can see you point. What I meant to say is that we would most
probably end up with the following pattern
	lockdep_trace_alloc_enable()
	some_foo_with_alloc(gfp_mask);
	lockdep_trace_alloc_disable()

and some_foo_with_alloc might be a lot of code. But at the same time we
know that _any_ allocation done from that context is safe from the
reclaim recursiveness POV. If not then annotation is buggy and needs to
be done at a different level but that would be exactly same if we did
some_foo_with_alloc(gfp_mask|__GFP_NOLOCKDEP) because all the
allocations down that road would reuse the same gfp mask anyway.

That being said I completely agree that a single entry point is much
less error prone but it also is tricky as we can see. So I would rather
go with something less tricky. It's not like people are not used to
enable/disable pattern.

Anyway I will leave the decision to you. If you really insist on
__GFP_NOLOCKDEP which doesn't consume new flag then I can review the
resulting patch.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>