Re: NFS Force Unmounting

NeilBrown <neilb@xxxxxxxx> · Fri, 03 Nov 2017 08:51:16 +1100

On Thu, Nov 02 2017, Chuck Lever wrote:

>> On Nov 1, 2017, at 8:15 PM, NeilBrown <neilb@xxxxxxxx> wrote:
>> 
>> On Tue, Oct 31 2017, Chuck Lever wrote:
>> 
>>>> On Oct 31, 2017, at 8:53 PM, NeilBrown <neilb@xxxxxxxx> wrote:
>>> 
>>>> Maybe I could just sweep the problem under the carpet and use lazy
>>>> unmounts.  That hides some of the problem, but doesn't stop sync(2) from
>>>> blocking indefinitely.  And once you have done the lazy unmount, there
>>>> is no longer any opportunity to use MNT_FORCE.
>>> 
>>> IMO a partial answer could be data caching in local files. If
>>> the client can't flush, then it can preserve the files until
>>> after the umount and reboot (using, say, fscache). Multi-client
>>> sharing is still hazardous, but that isn't a very frequent use
>>> case.
>> 
>> What data is it, exactly, that we are worried about here?
>> Data that an application has written, but that it hasn't called fsync()
>> on?  Do it isn't really all that important.  It might just be scratch
>> data.  It might be event logs.  It certainly isn't committed database
>> data, or an incoming email message, or data saved by an editor, or
>> really anything else important.
>> It is data that would be lost if you kicked the powerplug out by
>> mistake.
>> It is data that we would rather save if we could, but data that is not
>> worth bending over backwards to keep a copy of in a non-standard
>> location just in case someone really cares.
>> 
>> That's how I see it anyway.
>
> The assumption here is that any data loss or corruption when a
> mount is forcibly removed is strictly the responsibility of
> inadequate application design. Fair enough.
>
> One of my concerns (and I suspect Jeff is also worried about this
> use case) is what to do when tearing down containers that have
> stuck NFS mounts. Here, the host system is not being shut down,
> but the guests need to be shutdown cleanly so they can be destroyed.

Container-shutdown is a strong argument that MNT_FORCE is not enough and
that we need SIGKILL to actually remove the process.

Currently if there is pending dirty data on the final unmount of an NFS
filesystem, the superblock hangs around until the data is gone.  This
is good for shutting down containers, but it is a little weird - totally
different to every other filesystem.
If NFS were changed so that unmount blocked on the last unmount when
there is dirty data - just like every other filesystem - then we
probably need a way to "force" that unmount even in a container.

Thanks,
NeilBrown

>
> These containers may be sharing page cache data or a transport
> with other users on the host. In this case, we are trying to
> avoid a host shutdown to recover the stuck resources, and we
> do have the (possibly unnecessary) luxury of having a human
> being present to ask what to do to complete the recovery.
>
>
> --
> Chuck Lever
Attachment:
signature.asc

Description: PGP signature