Re: [PATCH 10/10 v7] nfsd: Allows user un-mounting filesystem where nfsd exports base on

NeilBrown <neilb@xxxxxxxx> · Fri, 24 Jul 2015 12:05:15 +1000

On Mon, 13 Jul 2015 05:45:53 +0100 Al Viro <viro@xxxxxxxxxxxxxxxxxx>
wrote:

> On Mon, Jul 13, 2015 at 02:20:59PM +1000, NeilBrown wrote:
> 
> > Actually, with that change to pin_kill, this side of things becomes
> > really easy.
> > All expXXX_pin_kill needs to do is call your new cache_delete_entry.
> > If that doesn't cause the entry to be put, then something else has a
> > temporary reference which will be put soon.  In any case, pin_kill()
> > will wait long enough, but not indefinitely.
> > No need for kref_get_unless_zero() or any of that.
> 
> No.  You are seriously misunderstanding what ->kill() is for and what the
> existing instances are doing.  Again, there is no promise whatsoever that
> the object containing fs_pin instance will *survive* past ->kill().
> At all.
> 
> RTFS, please.  What is sorely missing in this recurring patchset is a clear
> description of lifetime rules and ordering (who waits for whom and how long).
> For all the objects involved.

Good point.  Let me try.

Entries in the sunrpc 'cache' each contain some 'key' fields and some
'content' fields.

The key fields are set by the .init() method when the entry is
created, which can happen in a call to sunrpc_cache_lookup() or to
sunrpc_cache_update().

The content fields are set by the .update() method when a value is
provided for the cache entry.  This happens in sunrpc_cache_update();

A cache entry can be not-valid, negative, or valid.
It starts non-valid when sunrpc_cache_lookup() fails to find the search
key and so creates a new entry (and sets up the key with .init).
It then transitions to either negative or valid.
This can happen through sunrpc_cache_update() or through an error when
instigating an up-call, in which case it goes to negative.
Once it is negative or valid, it stays that way until it is released.
If sunrpc_cache_update is called on an entry that is not not-valid,
then a new entry is created and the old one is marked as expired.
A cache search will find the new one before the old.

The vfsmount object is involved in two separate caches.
It is part of the content of svc_expkey and part of the key of
svc_export.

An svc_expkey entry is only ever held transiently.  It is held while an
update is being processed, and it is held briefly while mapping a
filehandle to a mnt+dentry.
Firstly part of the filehandle is used to acccess the svc_expkey cache
to get the vfsmnt.  Then that vfsmnt plus the client identification is
looked up in the svc_export cache to find the export options.  Then the
svc_expkey cache entry is released.

So it is only held during a lookup of another cache.  This can take an
arbitrarily long time as the lookup can go to rpc.mountd in user-space.

The svc_export cache entry can be held for the duration of a single NFS
request.  It is stored in the 'struct svc_fh' file handle structure
which is release at the end of handling the request.

The vfsmnt and dentry are only "used" to validate the filehandle and
then while that filehandle is still active.

To avoid having unmount hang while nfsd is performing an upcall to
mountd, we need to legitimize the vfsmnt in the svc_expkey.  If that
fails, exp_find_key() can fail and we would never perform the lookup on
svc_export.

If it succeeds, then the legitimacy can be handed over to the svc_export
cache entry, which could then continue to own it, or could hand it on
to the svc_fh.

The latter is *probably* cleanest.
i.e. an svc_fh should always own a reference to exp->ex_path.mnt, and
fh_put must put it.

exp_find_key needs to legitimize ek->ek_path.mnt, so a successful
return from exp_find implies an active refernece to ->ex_path.mnt.
If exp_find fails, it needs to mnt_put(ek->ek_path.mnt).
All callers of exp_find need to mnt_put(exp->ex_path.mnt) when they
decide not to use the exp, and must otherwise store it in an svc_fh.

With this, pin_kill() should only need to wait for  exp_find_key() to
discover that it cannot legitimize the mount, or for expkey_path() to
replace the key via sunrpc_cache_update(), or maybe for cache_clean()
to discard an old entry.

Hopefully that makes it all clear.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html