On Fri, May 11, 2018 at 03:18:43AM +0100, Al Viro wrote: > On Fri, May 11, 2018 at 11:32:08AM +1000, Dave Chinner wrote: > > > i.e. we already have code in xfs_setup_inode() that sets the xfs > > inode ILOCK rwsem dir/non-dir lockdep class before the new inode is > > unlocked - we could just do the i_rwsem lockdep setup there, too. > > ... which would suffice - > > if (S_ISDIR(inode->i_mode)) { > struct file_system_type *type = inode->i_sb->s_type; > > /* Set new key only if filesystem hasn't already changed it */ > if (lockdep_match_class(&inode->i_rwsem, &type->i_mutex_key)) { > > in lockdep_annotate_inode_mutex_key() would make sure that ->i_rwsem will be > left alone by unlock_new_inode(). Ok, If you are happy with XFs doing that, I'll put together a patch and send it out. > > Then, if we were to factor unlock_new_inode() as Andreas suggested, > > we could call __unlock_new_inode() from xfs_finish_inode_setup(). > > No need - if you set the class in xfs_setup_inode(), you are fine. > > Said that, hash insertion is also potentially delicate - another ext2/nfsd > race from the same pile back in 2008 had been > * ext2_new_inode() chooses inumber > * open-by-fhandle guesses the inumber and hits ext2_iget(), which > inserts a locked in-core inode into icache and proceeds to block reading > it from disk. > * ext2_new_inode() inserts *its* in-core inode into icache (with > the same inumber) and sets the things up, both in-core and on disk > * open-by-fhandle is back and sees a good live on-disk inode. > It finishes setting the in-core one up and we'd got *TWO* in-core inodes > with the same inumber, both hashed, both with dentries, both used by > syscalls to do IO. Good times all around - fs corruption is fun. > > That was fixed by using insert_inode_locked() in ext2_new_inode(), and doing > that before the on-disk inode would start looking good. If it came during > ext2_iget(), it would've found an in-core inode with that inumber (locked, > doomed to be rejected), waited for it to come unlocked, see it unhashed > (since ext2_iget() said it was no good) and inserted its in-core inode > into hash (after having rechecked that nobody had an in-core inode with > the same inumber in there, that is). > > I'm not familiar enough with XFS icache replacment to tell if anything > of that sort is a problem there; might be a non-issue for any number > of reasons. I'm pretty sure we handle those cases - amongst other things we don't trust inode numbers in filehandles and so validation of inode numbers in incoming filehandles is serialised against allocating/freeing of inodes before it even gets to inode cache lookups... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx