Re: [GIT PULL] Detaching mounts on unlink for 3.15

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Sat, 09 Aug 2014 02:55:10 -0700

Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Sun, May 11, 2014 at 05:45:30PM +0100, Al Viro wrote:
>> Sigh...  It's really messy.
>> 	All versions since lazy fput introduction have acct_auto_close()
>> doing the wrong thing on r/o remount of superblock; we want the damn file
>> closed *before* we go further than acct_auto_close().  Worse, we are
>> holding ->s_umount there, so any kind of waiting would have to be very
>> careful to avoid deadlocks.  What's more, prevention of open for write
>> hits acct_auto_close(), so even if we wait there, we still have a window
>> when new acct file could be opened and not auto-closed.
>> 	All versions have problems with acct_process() in the middle of
>> umount(); originally it was a blatant call of ->write() happening without
>> any regard for file getting closed, then it was file getting written to
>> and closed in the middle of fs shutdown, then - write/close capable of
>> pushing fs shutdown past the return from umount(2).
>> 	All versions have problems with acct(NULL) vs. umount - the latter
>> does not wait for the former.  Eric's patches plug that one, but there's
>> a serious deadlock potential.
>
> OK, I think I've sorted that one out.  Eric, could you take a look at
> vfs.git#for-eric?  That's for-next + fix that ought to go into -stable +
> delayed-mntput() thing.  The real PITA had been kernel/acct.c mess;
> that's dealt with in -next.
>
> I think it solves the problem with "mntput in deep call chain" cases
> added in your series.  Final mntput() does fs shutdown, etc. on a shallow
> stack, via task_work_add() if at all possible.  MNT_INTERNAL vfsmounts
> are dealt with synchronously, which solves the problem of failure exits
> halfway through module_init needing to tear down an internal vfsmount, etc.
> But those call sites are all on fairly shallow stack anyway.  And such
> vfsmounts are not mounted on anything, so it's not something your changes
> could possibly step into.  No extra context switches per syscall, at that...
>
> I hadn't added mntput_sync() - no visible use cases.  If one shows up,
> it wouldn't be hard to add such primitive.  And unlike fput() we do not
> try to support mntput() from interrupt, etc. - too much PITA with no
> obvious use cases.  We'd need to decide whether we want to disable IRQs
> on lock_mount_hash(), etc.  It's doable, but let's leave that until we
> get a serious reason to mess with all that.

I see the changes.  The accounting changes look especially interesting.
I don't now if we need a mntput_sync but I will double check as I review
the patches.

I will aim to start reading through the changes carefully first thing in
the morning.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html