Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Sun, May 11, 2014 at 05:45:30PM +0100, Al Viro wrote: >> Sigh... It's really messy. >> All versions since lazy fput introduction have acct_auto_close() >> doing the wrong thing on r/o remount of superblock; we want the damn file >> closed *before* we go further than acct_auto_close(). Worse, we are >> holding ->s_umount there, so any kind of waiting would have to be very >> careful to avoid deadlocks. What's more, prevention of open for write >> hits acct_auto_close(), so even if we wait there, we still have a window >> when new acct file could be opened and not auto-closed. >> All versions have problems with acct_process() in the middle of >> umount(); originally it was a blatant call of ->write() happening without >> any regard for file getting closed, then it was file getting written to >> and closed in the middle of fs shutdown, then - write/close capable of >> pushing fs shutdown past the return from umount(2). >> All versions have problems with acct(NULL) vs. umount - the latter >> does not wait for the former. Eric's patches plug that one, but there's >> a serious deadlock potential. > > OK, I think I've sorted that one out. Eric, could you take a look at > vfs.git#for-eric? That's for-next + fix that ought to go into -stable + > delayed-mntput() thing. The real PITA had been kernel/acct.c mess; > that's dealt with in -next. > > I think it solves the problem with "mntput in deep call chain" cases > added in your series. Final mntput() does fs shutdown, etc. on a shallow > stack, via task_work_add() if at all possible. MNT_INTERNAL vfsmounts > are dealt with synchronously, which solves the problem of failure exits > halfway through module_init needing to tear down an internal vfsmount, etc. > But those call sites are all on fairly shallow stack anyway. And such > vfsmounts are not mounted on anything, so it's not something your changes > could possibly step into. No extra context switches per syscall, at that... > > I hadn't added mntput_sync() - no visible use cases. If one shows up, > it wouldn't be hard to add such primitive. And unlike fput() we do not > try to support mntput() from interrupt, etc. - too much PITA with no > obvious use cases. We'd need to decide whether we want to disable IRQs > on lock_mount_hash(), etc. It's doable, but let's leave that until we > get a serious reason to mess with all that. I see the changes. The accounting changes look especially interesting. I don't now if we need a mntput_sync but I will double check as I review the patches. I will aim to start reading through the changes carefully first thing in the morning. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html