On 7/27/2015 10:51, NeilBrown wrote: > On Mon, 27 Jul 2015 10:28:52 +0800 Kinglong Mee <kinglongmee@xxxxxxxxx> > wrote: > >> On 7/24/2015 10:05, NeilBrown wrote: >>> On Mon, 13 Jul 2015 05:45:53 +0100 Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>> wrote: >>> >>>> On Mon, Jul 13, 2015 at 02:20:59PM +1000, NeilBrown wrote: >>>> >>>>> Actually, with that change to pin_kill, this side of things becomes >>>>> really easy. >>>>> All expXXX_pin_kill needs to do is call your new cache_delete_entry. >>>>> If that doesn't cause the entry to be put, then something else has a >>>>> temporary reference which will be put soon. In any case, pin_kill() >>>>> will wait long enough, but not indefinitely. >>>>> No need for kref_get_unless_zero() or any of that. >>>> >>>> No. You are seriously misunderstanding what ->kill() is for and what the >>>> existing instances are doing. Again, there is no promise whatsoever that >>>> the object containing fs_pin instance will *survive* past ->kill(). >>>> At all. >>>> >>>> RTFS, please. What is sorely missing in this recurring patchset is a clear >>>> description of lifetime rules and ordering (who waits for whom and how long). >>>> For all the objects involved. >>> >>> Good point. Let me try. >>> >>> Entries in the sunrpc 'cache' each contain some 'key' fields and some >>> 'content' fields. >>> >>> The key fields are set by the .init() method when the entry is >>> created, which can happen in a call to sunrpc_cache_lookup() or to >>> sunrpc_cache_update(). >>> >>> The content fields are set by the .update() method when a value is >>> provided for the cache entry. This happens in sunrpc_cache_update(); >>> >>> A cache entry can be not-valid, negative, or valid. >>> It starts non-valid when sunrpc_cache_lookup() fails to find the search >>> key and so creates a new entry (and sets up the key with .init). >>> It then transitions to either negative or valid. >>> This can happen through sunrpc_cache_update() or through an error when >>> instigating an up-call, in which case it goes to negative. >>> Once it is negative or valid, it stays that way until it is released. >>> If sunrpc_cache_update is called on an entry that is not not-valid, >>> then a new entry is created and the old one is marked as expired. >>> A cache search will find the new one before the old. >>> >>> The vfsmount object is involved in two separate caches. >>> It is part of the content of svc_expkey and part of the key of >>> svc_export. >>> >>> An svc_expkey entry is only ever held transiently. It is held while an >>> update is being processed, and it is held briefly while mapping a >>> filehandle to a mnt+dentry. >>> Firstly part of the filehandle is used to acccess the svc_expkey cache >>> to get the vfsmnt. Then that vfsmnt plus the client identification is >>> looked up in the svc_export cache to find the export options. Then the >>> svc_expkey cache entry is released. >>> >>> So it is only held during a lookup of another cache. This can take an >>> arbitrarily long time as the lookup can go to rpc.mountd in user-space. >>> >>> >>> The svc_export cache entry can be held for the duration of a single NFS >>> request. It is stored in the 'struct svc_fh' file handle structure >>> which is release at the end of handling the request. >>> >>> The vfsmnt and dentry are only "used" to validate the filehandle and >>> then while that filehandle is still active. >>> >>> >>> To avoid having unmount hang while nfsd is performing an upcall to >>> mountd, we need to legitimize the vfsmnt in the svc_expkey. If that >>> fails, exp_find_key() can fail and we would never perform the lookup on >>> svc_export. >>> >>> If it succeeds, then the legitimacy can be handed over to the svc_export >>> cache entry, which could then continue to own it, or could hand it on >>> to the svc_fh. >>> >>> The latter is *probably* cleanest. >>> i.e. an svc_fh should always own a reference to exp->ex_path.mnt, and >>> fh_put must put it. >> >> I don't agree adding new argument (eg, fh_vfsmnt) in svc_fh. > > I wasn't suggesting that a new field be added to svc_fh. > Just that if svc_fh->fh_export was not NULL, then the svc_fh "owned" a > reference to svc_fh->fh_export->ex_path.mnt which it had to mnt_put() > when it released ->fh_export. > > So fh_put would need to change, but not much else. > > It isn't the only way to handle that references - it just seemed the > neatest as I was writing the description. Something else might work > better in the code. Got it, thanks for your comments. > >> >> With it, should nfsd using fh_vfsmnt always, never using exp->ex_path.mnt >> outside of export.c/export.h ? >> >> If choose fh_vfsmnt, so many codes need be updated, especially functions. >> If exp->ex_path.mnt, the new argument fh_vfsmnt seems redundant. >> >> Thanks for your work. >> >> It reminders a new method, >> >> 1. There are only one outlet from each cache, exp_find_key() for expkey, >> exp_get_by_name() for export. >> 2. Any fsid to export or filehandle to export will call the function. >> 3. exp_get()/exp_put() increase/decrease the reference of export. >> >> Like the fh_vfsmnt (not same), call legitimize_mntget() in the only >> outlet function exp_find_key()/exp_get_by_name(), if fail return STALE, >> otherwise, any valid expkey/export from the cache is validated (Have >> get the reference of vfsmnt). >> >> Add mntget() in exp_get() and mntput() in exp_put(), because the export >> passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name. >> >>> >>> exp_find_key needs to legitimize ek->ek_path.mnt, so a successful >>> return from exp_find implies an active refernece to ->ex_path.mnt. >>> If exp_find fails, it needs to mnt_put(ek->ek_path.mnt). >> >> Yes, it's great. >> >>> All callers of exp_find need to mnt_put(exp->ex_path.mnt) when they >>> decide not to use the exp, and must otherwise store it in an svc_fh. >>> >>> With this, pin_kill() should only need to wait for exp_find_key() to >>> discover that it cannot legitimize the mount, or for expkey_path() to >>> replace the key via sunrpc_cache_update(), or maybe for cache_clean() >>> to discard an old entry. >>> >>> Hopefully that makes it all clear. >> >> Yes, thanks again. >> >> With my method, for expkey cache, >> 1. At first, a fsid is passed to exp_find_key, and lookup a cache >> in svc_expkey_lookup, if success, ekey->ek_path is pined to mount. >> 2. Then call legitimize_mntget getting a reference of vfsmnt >> before return from exp_find_key. >> 3. Any calling exp_find_key with valid cache must put the vfsmnt. >> >> for export cache, >> 1. At first, a path (returned from exp_find_key) with validate vfsmnt >> is passed to exp_get_by_name, if success, exp->ex_path is pined to mount. >> 2. Then call legitimize_mntget getting a reference of vfsmnt >> before return from exp_get_by_name. > > I don't see any point in calling legitimise_mntget here. exp_find_key > already did the 'legitimize' bit so there is no need to do it again. I just think they are in two logical. But, does export cache contains a different vfsmnt as expkey exist? thanks, Kinglong Mee -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html