Re: [PATCH RFC v6 2/2] nfsd: Initial implementation of NFSv4 Courteous Server

dai.ngo@xxxxxxxxxx · Wed, 8 Dec 2021 08:25:28 -0800

On 12/8/21 8:16 AM, Trond Myklebust wrote:
On Wed, 2021-12-08 at 07:54 -0800, dai.ngo@xxxxxxxxxx wrote:
On 12/6/21 11:55 AM, Chuck Lever III wrote:

+
+/*
+ * Function to check if the nfserr_share_denied error for 'fp'
resulted
+ * from conflict with courtesy clients then release their state to
resolve
+ * the conflict.
+ *
+ * Function returns:
+ *      0 -  no conflict with courtesy clients
+ *     >0 -  conflict with courtesy clients resolved, try
access/deny check again
+ *     -1 -  conflict with courtesy clients being resolved in
background
+ *            return nfserr_jukebox to NFS client
+ */
+static int
+nfs4_destroy_clnts_with_sresv_conflict(struct svc_rqst *rqstp,
+                       struct nfs4_file *fp, struct
nfs4_ol_stateid *stp,
+                       u32 access, bool share_access)
+{
+       int cnt = 0;
+       int async_cnt = 0;
+       bool no_retry = false;
+       struct nfs4_client *cl;
+       struct list_head *pos, *next, reaplist;
+       struct nfsd_net *nn = net_generic(SVC_NET(rqstp),
nfsd_net_id);
+
+       INIT_LIST_HEAD(&reaplist);
+       spin_lock(&nn->client_lock);
+       list_for_each_safe(pos, next, &nn->client_lru) {
+               cl = list_entry(pos, struct nfs4_client, cl_lru);
+               /*
+                * check all nfs4_ol_stateid of this client
+                * for conflicts with 'access'mode.
+                */
+               if (nfs4_check_deny_bmap(cl, fp, stp, access,
share_access)) {
+                       if (!test_bit(NFSD4_COURTESY_CLIENT, &cl-
cl_flags)) {
+                               /* conflict with non-courtesy
client */
+                               no_retry = true;
+                               cnt = 0;
+                               goto out;
+                       }
+                       /*
+                        * if too many to resolve synchronously
+                        * then do the rest in background
+                        */
+                       if (cnt > 100) {
+                               set_bit(NFSD4_DESTROY_COURTESY_CLIE
NT, &cl->cl_flags);
+                               async_cnt++;
+                               continue;
+                       }
+                       if (mark_client_expired_locked(cl))
+                               continue;
+                       cnt++;
+                       list_add(&cl->cl_lru, &reaplist);
+               }
+       }
Bruce suggested simply returning NFS4ERR_DELAY for all cases.
That would simplify this quite a bit for what is a rare edge
case.
If we always do this asynchronously by returning NFS4ERR_DELAY
for all cases then the following pynfs tests need to be modified
to handle the error:

RENEW3   st_renew.testExpired                                     :
FAILURE
LKU10    st_locku.testTimedoutUnlock                              :
FAILURE
CLOSE9   st_close.testTimedoutClose2                              :
FAILURE

and any new tests that opens file have to be prepared to handle
NFS4ERR_DELAY due to the lack of destroy_clientid in 4.0.

Do we still want to take this approach?
NFS4ERR_DELAY is a valid error for both CLOSE and LOCKU (see RFC7530
section 13.2 https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/rfc7530*section-13.2__;Iw!!ACWV5N9M2RV99hQ!f8vZHAJophxXdSSJvnxDCSBSRpWFxEOZBo2ZLvjPzXLVrvMYR8RKcc0_Jvjhng$
) so if pynfs complains, then it needs fixing regardless.

RENEW, on the other hand, cannot return NFS4ERR_DELAY, but why would it
need to? Either the lease is still valid, or else someone is already
trying to tear it down due to an expiration event. I don't see why
courtesy locks need to add any further complexity to that test.

RENEW fails in the 2nd open:

    c.create_confirm(t.word(), access=OPEN4_SHARE_ACCESS_BOTH,
                     deny=OPEN4_SHARE_DENY_BOTH)     <<======   DENY_BOTH
    sleeptime = c.getLeaseTime() * 2
    env.sleep(sleeptime)
    c2 = env.c2
    c2.init_connection()
    c2.open_confirm(t.word(), access=OPEN4_SHARE_ACCESS_READ,    <<=== needs to handle NFS4ERR_DELAY
                    deny=OPEN4_SHARE_DENY_NONE)

CLOSE and LOCKU also fail in the OPEN, similar to the RENEW test.
Any new pynfs 4.0 test that does open might get NFS4ERR_DELAY.

-Dai