On Thu, Apr 17, 2014 at 11:12:03PM +0100, Al Viro wrote: > I'd probably turn mntput_no_expire() into something like > static struct mount *__mntput(struct mount *m) > that would return NULL if nothing needs to be killed and returned m > if m really needs killing. Leaving the caller to decide what to do > with that puppy. We have, as it is, exactly two callers - exit path > in sys_umount() and mntput(). So we add two more functions: > static void kill_mnt_async(struct mount *m) > and > static void kill_mnt_sync(struct mount *m) > both being no-op on NULL. Then in sys_umount() and mntput() we do > kill_mnt_async(__mntput(mnt)); > and in mntput_sync() - kill_mnt_sync(__mntput(mnt)); > For that matter, kill_mnt_sync() (basically, your variant with completions) > can be folded into mntput_sync(). Actually, all kern_unmount() callers are doing that from fairly shallow stack depth and all simple_release_fs() ones are dealing with rather trivial ->kill_sb(). So mntput_sync() is an overkill; all we need is if (mnt->mnt_flags & MNT_INTERNAL) { cleanup_mnt(mnt); return; } <do task_work_add or schedule_delayed_work song and dance> right in the end of mntput_no_expire(). OK, now I have something that looks like a complete solution. The last missing bit is to take all filp_close() of acct->file to kernel thread, and have them done via __fput_sync() there. Then auto-close (done from cleanup_mnt()) will consist of shutting down all affected acct and waiting for that kernel thread to run through everything currently in its queue. That'll take care of waiting until acct(NULL) done by somebody else gets through closing the file and through corresponding mntput(). And *those* mntput() also can be synchronous - they are clones of the one we hadn't finished shutting down yet, so both dput() and deactivate_super() will bugger off immediately. So we just mark those instead-of-mnt_pin() clones as MNT_INTERNAL. Voila. After that ->mnt_pinned crap dies, acct auto-close ought to be race-free and we get the actual fs shutdown guaranteed to be on shallow stack, without extra context switches, etc. in the normal case. Let's see if that survives testing... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html