Re: Xfs lockdep warning with for-dave-for-4.6 branch

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 17 May 2016 09:10:56 +1000

On Mon, May 16, 2016 at 03:25:41PM +0200, Peter Zijlstra wrote:
> On Mon, May 16, 2016 at 03:05:19PM +0200, Michal Hocko wrote:
> > On Mon 16-05-16 12:41:30, Peter Zijlstra wrote:
> > > On Fri, May 13, 2016 at 06:03:41PM +0200, Michal Hocko wrote:
> > IIRC Dave's emails they have tried that by using lockdep classes and
> > that turned out to be an overly complex maze which still doesn't work
> > 100% reliably.
> 
> So that would be the: 
> 
> > >  but we can't tell lockdep that easily in any way
> > > without going back to the bad old ways of creating a new lockdep
> > > class for inode ilocks the moment they enter ->evict. This then
> > > disables "entire lifecycle" lockdep checking on the xfs inode ilock,
> > > which is why we got rid of it in the first place."
> > > 
> > > But fails to explain the problems with the 'old' approach.
> > > 
> > > So clearly this is a 'problem' that has existed for quite a while, so I
> > > don't see any need to rush half baked solutions either.
> > 
> > Well, at least my motivation for _some_ solution here is that xfs has
> > worked around this deficiency by forcing GFP_NOFS also for contexts which
> > are perfectly OK to do __GFP_FS allocation. And that in turn leads to
> > other issues which I would really like to sort out. So the idea was to
> > give xfs another way to express that workaround that would be a noop
> > without lockdep configured.
> 
> Right, that's unfortunate. But I would really like to understand the
> problem with the classes vs lifecycle thing.
> 
> Is there an email explaining that somewhere?

Years ago (i.e. last time I bothered mentioning that lockdep didn't
cover these cases) but buggered if I can find a reference.

We used to have iolock classes. Added in commit 033da48 ("xfs: reset
the i_iolock lock class in the reclaim path") in 2010, removed by
commit 4f59af7 ("xfs: remove iolock lock classes") a couple of years
later.

We needed distinct lock classes above and below reclaim because the
same locks were taken above and below memory allocation, and the
reclaimed inode recycling made lockdep think it had a loop in it's
graph:

inode lookup	memory reclaim		inode lookup
------------	--------------		------------
not found
allocate inode
init inode
take active reference
return to caller
.....
		  inode shrinker
		  find inode with no active reference
		   ->evict
		     start transaction
		       take inode locks
		     .....
		     commit transaction
		     mark for reclaim
		     .....

					find inactive inode in reclaimable state
					remove from reclaim list
					re-initialise inode state
					take active reference
					return to caller

So you can see that an inode can go through the reclaim context
without being freed but having taken locks, but then immediately
reused in a non-reclaim context. Hence we had to split the two
lockdep contexts and re-initialise the lock contexts in the
different allocation paths.

The reason we don't have lock clases for the ilock is that we aren't
supposed to call memory reclaim with that lock held in exclusive
mode. This is because reclaim can run transactions, and that may
need to flush dirty inodes to make progress. Flushing dirty inode
requires taking the ilock in shared mode.

In the code path that was reported, we hold the ilock in /shared/
mode with no transaction context (we are doing a read-only
operation). This means we can run transactions in memory reclaim
because a) we can't deadlock on the inode we hold locks on, and b)
transaction reservations will be able to make progress as we don't
hold any locks it can block on.

Sure, we can have multiple lock classes so that this can be done,
but that then breaks lock order checking between the two contexts.
Both contexts require locking order and transactional behaviour to
be identical and, from this perspective, class separation is
effectively no different from turning lockdep off.

For the ilock, the number of places where the ilock is held over
GFP_KERNEL allocations is pretty small. Hence we've simply added
GFP_NOFS to those allocations to - effectively - annotate those
allocations as "lockdep causes problems here". There are probably
30-35 allocations in XFS that explicitly use KM_NOFS - some of these
are masking lockdep false positive reports.

The reason why we had lock classes for the *iolock* (not the ilock)
was because ->evict processing used to require the iolock for
attribute removal. This caused all sorts of problems with
GFP_KERNEL allocations in the IO path (e.g. allocate a page cache
page under the IO lock, direct reclaim would trigger lockdep
warnings - there's another reason why XFS always used GFP_NOFS
contexts for page cache allocations) which were all false positives
because the inode holding the lock has active references and hence
will never be accessed by the reclaim path. The only workable
solution iwe could come up with at the time this was reported was
to this was to split the lock classes.

In the end, like pretty much all the complex lockdep false positives
we've had to deal in XFS, we've ended up changing the locking or
allocation contexts because that's been far easier than trying to
make annotations cover everything or convince other people that
lockdep annotations are insufficient.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs