Hi, On Tue, Mar 11, 2014 at 07:46:47AM +1100, Dave Chinner wrote: > On Mon, Mar 10, 2014 at 03:37:16AM -0700, Christoph Hellwig wrote: > > On Mon, Mar 10, 2014 at 01:55:23PM +1100, Dave Chinner wrote: > > > Changing the directory code to handle this sort of locking is going > > > to require a bit of surgery. However, I can see advantages to moving > > > directory data to the same locking strategy as regular file data - > > > locking heirarchies are identical, directory ilock hold times are > > > much reduced, we don't get lockdep whining about taking page faults > > > with the ilock held, etc. > > > > > > A quick hack at to demonstrate the high level, initial step of using > > > the IOLOCK for readdir serialisation. I've done a little smoke > > > testing on it, so it won't die immediately. It should get rid of all > > > the nasty lockdep issues, but it doesn't start to address the deeper > > > restructing that is needed. > > > > What synchronization do we actually need from the iolock? Pushing the > > ilock down to where it's actually needed is a good idea either way, > > though. > > The issue is that if we push the ilock down to the just the block > mapping routines, the directory can be modified while the readdir is > in progress. That's the root problem that adding the ilock solved. > Now, just pushing the ilock down to protect the bmbt lookups might > result in a consistent lookup, but it won't serialise sanely against > modifications. > > i.e. readdir only walks one dir block at a time but > it maps multiple blocks for readahead and keeps them in a local > array and doesn't validate them again before issuing read o nthose > buffers. Hence at a high level we currently have to serialise > readdir against all directory modifications. > > The only other option we might have is to completely rewrite the > directory readahead code not to cache mappings. If we use the ilock > purely for bmbt lookup and buffer read, then the ilock will > serialise against modification, and the buffer lock will stabilise > the buffer until the readdir moves to the next buffer and picks the > ilock up again to read it. > > That would avoid the need for high level serialisation, but it's a > lot more work than using the iolock to provide the high level > serialisation and i'm still not sure it's 100% safe. And I've got no > idea if it would work for CXFS. Hopefully someone from SGI will > chime in here.... Also in leaf and node formats a single modification can change multiple buffers, so I suspect the buffer lock isn't enough serialization to maintain a consistent directory in the face of multiple readers and writers. The iolock does resolve that issue. > > > This would be a straight forward change, except for two things: > > > filestreams and lockdep. The filestream allocator takes the > > > directory iolock and makes assumptions about parent->child locking > > > order of the iolock which will now be invalidated. Hence some > > > changes to the filestreams code is needed to ensure that it never > > > blocks on directory iolocks and deadlocks. instead it needs to fail > > > stream associations when such problems occur. > > > > I think the right fix is to stop abusing the iolock in filestreams. > > To me it seems like a look inside fstrm_item_t should be fine > > for what the filestreams code wants if I understand it correctly. > > > > From looking over some of the filestreams code just for a few minutes > > I get an urge to redo lots of it right now.. > > I get that urge from time to time, too. So far I've managed to avoid > it. > > > > @@ -1228,7 +1244,7 @@ xfs_create( > > > * the transaction cancel unlocking dp so don't do it explicitly in the > > > * error path. > > > */ > > > - xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL); > > > + xfs_trans_ijoin(tp, dp, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL); > > > > What do we need the iolock on these operations for? > > These are providing the high level readdir vs modification > serialisation protection. And we have to unlock it on transaction > commit, which is why it needs to be added to the xfs_trans_ijoin() > calls... Makes sense, I think. I'm not sure what the changes to the directory code would look like. -Ben _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs