On Tue, May 05, 2009 at 03:40:36PM +0200, Christoph Hellwig wrote: > From: Christoph Hellwig <hch@xxxxxx> > > Move s_lock and BKL into ->put_super from the only caller. A couple of > filesystems had trivial enough ->put_super (only kfree and NULLing of > s_fs_info + stuff in there) to not get any locking: code, cramfs, efs, > hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most > of them probably don't need it, but I'd rather sort that out individually. > Preferably after all the other BKL and lock_super pushdowns in that area. NAK. Getting BKL down there is fine, but lock_super() in that place should simply die now. Look: lock_super() does two things - exclusion on s_lock and marking the task as "holds excl fs resource, other tasks are likely to be blocked by it, try to push its IO at higher priority" for CFQ. Let's consider the exclusion first. We *can't* get contention on s_lock at that point. The only thing that takes s_lock is lock_super() itself. Callers: generic_shutdown_super (this one, s_umount held exclusive since it's always called from ->kill_sb) [1] write_super() from sync_supers() - s_umount held shared do_remount_sb() - s_umount held [2] file_fsync() - we shouldn't ever be in generic_shutdown_super() while there are files opened on that superblock, TYVM __sync_filesystem() - called from sync_filesystems() (s_umount shared) and from sync_filesystem(). Callers of sync_filesystem(): fsync_bdev(), freeze_bdev() - s_umount shared. fs/cachefiles - there we have a vfsmount with ->mnt_sb pointing to that superblock, so generic_shutdown_super() shouldn't be possible (active reference to superblock) and generic_shutdown_super() itself. Again, s_umount exclusive and in any case, there's only one call of that sucker in the entire lifetime of given struct super_block. So we can't have contention on s_lock in that area - any such contention would either happen on s_umount or would be a result of severely buggered refcounting on struct super_block. That leaves us with fs_excl alone. And fair enough, this thing *does* hold exclusive fs resources. Only one, in fact - s_umount. Nothing is going to wait for it to finish in order to do something with fs itself. So if we are going to bump fs_excl, we ought to do it in deactivate_super(), not here. After all, ->put_super() is not where the majority of IO on final umount will be happening... As for this patch: we need to * replace lock_super/unlock_super() with get_fs_excl()/put_fs_excl() in the same places. * don't do lock_super() inside ->put_super() at all. * take BKL down to ->put_super(). I'm going to put it in as two patches - lock_super part + trimmed down version of this one, sans lock_super. [1] and from grepping through the instances... some of the callers are seriously fishy. All of them *are* in ->kill_sb() and not called other that via deactivate_super(). [2] normally exclusive, but there are callers holding it shared; all of them should switch to excl, but that doesn't matter for our purposes. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html