Re: Multi-CPU harmless lockdep on x86 while copying data

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 11 Mar 2014 07:46:47 +1100

On Mon, Mar 10, 2014 at 03:37:16AM -0700, Christoph Hellwig wrote:
> On Mon, Mar 10, 2014 at 01:55:23PM +1100, Dave Chinner wrote:
> > Changing the directory code to handle this sort of locking is going
> > to require a bit of surgery. However, I can see advantages to moving
> > directory data to the same locking strategy as regular file data -
> > locking heirarchies are identical, directory ilock hold times are
> > much reduced, we don't get lockdep whining about taking page faults
> > with the ilock held, etc.
> > 
> > A quick hack at to demonstrate the high level, initial step of using
> > the IOLOCK for readdir serialisation. I've done a little smoke
> > testing on it, so it won't die immediately. It should get rid of all
> > the nasty lockdep issues, but it doesn't start to address the deeper
> > restructing that is needed.
> 
> What synchronization do we actually need from the iolock?  Pushing the
> ilock down to where it's actually needed is a good idea either way,
> though.

The issue is that if we push the ilock down to the just the block
mapping routines, the directory can be modified while the readdir is
in progress. That's the root problem that adding the ilock solved.
Now, just pushing the ilock down to protect the bmbt lookups might
result in a consistent lookup, but it won't serialise sanely against
modifications.

i.e. readdir only walks one dir block at a time but
it maps multiple blocks for readahead and keeps them in a local
array and doesn't validate them again before issuing read o nthose
buffers. Hence at a high level we currently have to serialise
readdir against all directory modifications.

The only other option we might have is to completely rewrite the
directory readahead code not to cache mappings. If we use the ilock
purely for bmbt lookup and buffer read, then the ilock will
serialise against modification, and the buffer lock will stabilise
the buffer until the readdir moves to the next buffer and picks the
ilock up again to read it.

That would avoid the need for high level serialisation, but it's a
lot more work than using the iolock to provide the high level
serialisation and i'm still not sure it's 100% safe. And I've got no
idea if it would work for CXFS. Hopefully someone from SGI will
chime in here....

> > This would be a straight forward change, except for two things:
> > filestreams and lockdep. The filestream allocator takes the
> > directory iolock and makes assumptions about parent->child locking
> > order of the iolock which will now be invalidated. Hence some
> > changes to the filestreams code is needed to ensure that it never
> > blocks on directory iolocks and deadlocks. instead it needs to fail
> > stream associations when such problems occur.
> 
> I think the right fix is to stop abusing the iolock in filestreams.
> To me it seems like a look inside fstrm_item_t should be fine
> for what the filestreams code wants if I understand it correctly.
> 
> From looking over some of the filestreams code just for a few minutes
> I get an urge to redo lots of it right now..

I get that urge from time to time, too. So far I've managed to avoid
it.

> > @@ -1228,7 +1244,7 @@ xfs_create(
> >  	 * the transaction cancel unlocking dp so don't do it explicitly in the
> >  	 * error path.
> >  	 */
> > -	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> > +	xfs_trans_ijoin(tp, dp, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL);
> 
> What do we need the iolock on these operations for?

These are providing the high level readdir vs modification
serialisation protection. And we have to unlock it on transaction
commit, which is why it needs to be added to the xfs_trans_ijoin()
calls...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs