On 12/8/21 8:16 AM, Trond Myklebust wrote:
On Wed, 2021-12-08 at 07:54 -0800, dai.ngo@xxxxxxxxxx wrote:
On 12/6/21 11:55 AM, Chuck Lever III wrote:
+
+/*
+ * Function to check if the nfserr_share_denied error for 'fp'
resulted
+ * from conflict with courtesy clients then release their state to
resolve
+ * the conflict.
+ *
+ * Function returns:
+ * 0 - no conflict with courtesy clients
+ * >0 - conflict with courtesy clients resolved, try
access/deny check again
+ * -1 - conflict with courtesy clients being resolved in
background
+ * return nfserr_jukebox to NFS client
+ */
+static int
+nfs4_destroy_clnts_with_sresv_conflict(struct svc_rqst *rqstp,
+ struct nfs4_file *fp, struct
nfs4_ol_stateid *stp,
+ u32 access, bool share_access)
+{
+ int cnt = 0;
+ int async_cnt = 0;
+ bool no_retry = false;
+ struct nfs4_client *cl;
+ struct list_head *pos, *next, reaplist;
+ struct nfsd_net *nn = net_generic(SVC_NET(rqstp),
nfsd_net_id);
+
+ INIT_LIST_HEAD(&reaplist);
+ spin_lock(&nn->client_lock);
+ list_for_each_safe(pos, next, &nn->client_lru) {
+ cl = list_entry(pos, struct nfs4_client, cl_lru);
+ /*
+ * check all nfs4_ol_stateid of this client
+ * for conflicts with 'access'mode.
+ */
+ if (nfs4_check_deny_bmap(cl, fp, stp, access,
share_access)) {
+ if (!test_bit(NFSD4_COURTESY_CLIENT, &cl-
cl_flags)) {
+ /* conflict with non-courtesy
client */
+ no_retry = true;
+ cnt = 0;
+ goto out;
+ }
+ /*
+ * if too many to resolve synchronously
+ * then do the rest in background
+ */
+ if (cnt > 100) {
+ set_bit(NFSD4_DESTROY_COURTESY_CLIE
NT, &cl->cl_flags);
+ async_cnt++;
+ continue;
+ }
+ if (mark_client_expired_locked(cl))
+ continue;
+ cnt++;
+ list_add(&cl->cl_lru, &reaplist);
+ }
+ }
Bruce suggested simply returning NFS4ERR_DELAY for all cases.
That would simplify this quite a bit for what is a rare edge
case.
If we always do this asynchronously by returning NFS4ERR_DELAY
for all cases then the following pynfs tests need to be modified
to handle the error:
RENEW3 st_renew.testExpired :
FAILURE
LKU10 st_locku.testTimedoutUnlock :
FAILURE
CLOSE9 st_close.testTimedoutClose2 :
FAILURE
and any new tests that opens file have to be prepared to handle
NFS4ERR_DELAY due to the lack of destroy_clientid in 4.0.
Do we still want to take this approach?
NFS4ERR_DELAY is a valid error for both CLOSE and LOCKU (see RFC7530
section 13.2 https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/rfc7530*section-13.2__;Iw!!ACWV5N9M2RV99hQ!f8vZHAJophxXdSSJvnxDCSBSRpWFxEOZBo2ZLvjPzXLVrvMYR8RKcc0_Jvjhng$
) so if pynfs complains, then it needs fixing regardless.
RENEW, on the other hand, cannot return NFS4ERR_DELAY, but why would it
need to? Either the lease is still valid, or else someone is already
trying to tear it down due to an expiration event. I don't see why
courtesy locks need to add any further complexity to that test.
RENEW fails in the 2nd open:
c.create_confirm(t.word(), access=OPEN4_SHARE_ACCESS_BOTH,
deny=OPEN4_SHARE_DENY_BOTH) <<====== DENY_BOTH
sleeptime = c.getLeaseTime() * 2
env.sleep(sleeptime)
c2 = env.c2
c2.init_connection()
c2.open_confirm(t.word(), access=OPEN4_SHARE_ACCESS_READ, <<=== needs to handle NFS4ERR_DELAY
deny=OPEN4_SHARE_DENY_NONE)
CLOSE and LOCKU also fail in the OPEN, similar to the RENEW test.
Any new pynfs 4.0 test that does open might get NFS4ERR_DELAY.
-Dai