Re: [PATCH v2] vfs: introduce UMOUNT_WAIT which waits for umount completion

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Thu, 21 Sep 2017 03:42:52 +0100

On Wed, Sep 20, 2017 at 05:34:09PM -0700, Jaegeuk Kim wrote:
> > 	flush_delayed_fput()
> > 		does nothing, the list is empty
> 
> 		how about waiting for workqueue completion here?
> 
> > 	....
> 
> 	If all the __fput()s are not finished, do_umount() will return -EBUSY.

Hell, no.  That's only when they are all on the same vfsmount.  And in that
case you don't need any waiting - if any of those mntput() is not past the
unlock_mount_hash() in mntput_no_expire(), you will get -EBUSY.  And if they
all are, the caller of umount(2) will end up dropping the last reference.  
In which case the shutdown will be scheduled via task_work_add() and processed
before umount(2) returns to userland.

The whole problem is that you have several vfsmounts over the same filesystem
(== same struct super_block), some of them held by kernel threads of yours.
umount(2) doesn't affect those and isn't affected by those.  What you do is,
AFAICS,
	ask the kernel threads to start shutting down
	umount()
	shut device down, hoping that all vfsmounts that used
to be held by those threads are gone by that point.

Your patch tries to stick "flush the pending work" in the umount().
With no warranty that it will catch that stuff in the stage where
flushing will affect anything.

> +void flush_delayed_fput_wait(void)
> +{
> +	delayed_fput(NULL);
> +	flush_delayed_work(&delayed_fput_work);
> +}

> +void flush_delayed_mntput_wait(void)
> +{
> +	delayed_mntput(NULL);
> +	flush_delayed_work(&delayed_mntput_work);
> +}

It's still a broken approach.  What I don't understand is why bother
with that sort of brittle logics in the first place.  Why not simply
open the damn thing with O_EXCL before proceeding to device shutdown?
And if you get "busy" from that, wait and retry...