Re: [PATCH RFC v6 2/2] nfsd: Initial implementation of NFSv4 Courteous Server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 12/8/21 8:39 AM, bfields@xxxxxxxxxxxx wrote:
On Wed, Dec 08, 2021 at 08:25:28AM -0800, dai.ngo@xxxxxxxxxx wrote:
On 12/8/21 8:16 AM, Trond Myklebust wrote:
On Wed, 2021-12-08 at 07:54 -0800, dai.ngo@xxxxxxxxxx wrote:
On 12/6/21 11:55 AM, Chuck Lever III wrote:

+
+/*
+ * Function to check if the nfserr_share_denied error for 'fp'
resulted
+ * from conflict with courtesy clients then release their state to
resolve
+ * the conflict.
+ *
+ * Function returns:
+ *      0 -  no conflict with courtesy clients
+ *     >0 -  conflict with courtesy clients resolved, try
access/deny check again
+ *     -1 -  conflict with courtesy clients being resolved in
background
+ *            return nfserr_jukebox to NFS client
+ */
+static int
+nfs4_destroy_clnts_with_sresv_conflict(struct svc_rqst *rqstp,
+                       struct nfs4_file *fp, struct
nfs4_ol_stateid *stp,
+                       u32 access, bool share_access)
+{
+       int cnt = 0;
+       int async_cnt = 0;
+       bool no_retry = false;
+       struct nfs4_client *cl;
+       struct list_head *pos, *next, reaplist;
+       struct nfsd_net *nn = net_generic(SVC_NET(rqstp),
nfsd_net_id);
+
+       INIT_LIST_HEAD(&reaplist);
+       spin_lock(&nn->client_lock);
+       list_for_each_safe(pos, next, &nn->client_lru) {
+               cl = list_entry(pos, struct nfs4_client, cl_lru);
+               /*
+                * check all nfs4_ol_stateid of this client
+                * for conflicts with 'access'mode.
+                */
+               if (nfs4_check_deny_bmap(cl, fp, stp, access,
share_access)) {
+                       if (!test_bit(NFSD4_COURTESY_CLIENT, &cl-
cl_flags)) {
+                               /* conflict with non-courtesy
client */
+                               no_retry = true;
+                               cnt = 0;
+                               goto out;
+                       }
+                       /*
+                        * if too many to resolve synchronously
+                        * then do the rest in background
+                        */
+                       if (cnt > 100) {
+                               set_bit(NFSD4_DESTROY_COURTESY_CLIE
NT, &cl->cl_flags);
+                               async_cnt++;
+                               continue;
+                       }
+                       if (mark_client_expired_locked(cl))
+                               continue;
+                       cnt++;
+                       list_add(&cl->cl_lru, &reaplist);
+               }
+       }
Bruce suggested simply returning NFS4ERR_DELAY for all cases.
That would simplify this quite a bit for what is a rare edge
case.
If we always do this asynchronously by returning NFS4ERR_DELAY
for all cases then the following pynfs tests need to be modified
to handle the error:

RENEW3   st_renew.testExpired                                     :
FAILURE
LKU10    st_locku.testTimedoutUnlock                              :
FAILURE
CLOSE9   st_close.testTimedoutClose2                              :
FAILURE

and any new tests that opens file have to be prepared to handle
NFS4ERR_DELAY due to the lack of destroy_clientid in 4.0.

Do we still want to take this approach?
NFS4ERR_DELAY is a valid error for both CLOSE and LOCKU (see RFC7530
section 13.2 https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/rfc7530*section-13.2__;Iw!!ACWV5N9M2RV99hQ!f8vZHAJophxXdSSJvnxDCSBSRpWFxEOZBo2ZLvjPzXLVrvMYR8RKcc0_Jvjhng$
) so if pynfs complains, then it needs fixing regardless.

RENEW, on the other hand, cannot return NFS4ERR_DELAY, but why would it
need to? Either the lease is still valid, or else someone is already
trying to tear it down due to an expiration event. I don't see why
courtesy locks need to add any further complexity to that test.
RENEW fails in the 2nd open:

     c.create_confirm(t.word(), access=OPEN4_SHARE_ACCESS_BOTH,
                      deny=OPEN4_SHARE_DENY_BOTH)     <<======   DENY_BOTH
     sleeptime = c.getLeaseTime() * 2
     env.sleep(sleeptime)
     c2 = env.c2
     c2.init_connection()
     c2.open_confirm(t.word(), access=OPEN4_SHARE_ACCESS_READ,    <<=== needs to handle NFS4ERR_DELAY
                     deny=OPEN4_SHARE_DENY_NONE)

CLOSE and LOCKU also fail in the OPEN, similar to the RENEW test.
Any new pynfs 4.0 test that does open might get NFS4ERR_DELAY.
So it's a RENEW test, not the RENEW operation.

A general-purpose client always has to be prepared for DELAY on OPEN.
But pynfs isn't a general-purpose client, and it assumes that it's the
only one touching the files and directories it creates.

Within pynfs we've got a problem that the tests don't necessarily clean
up after themselves completely, so in theory a test could interfere with
later results.

But each test uses its own files--e.g. in the fragment above note that
the file it's testing gets the name t.word(), which is by design unique
to that test.  So it shouldn't be hitting any conflicts with state held
by previous tests.  Am I missing something?

Both calls, c.create_confirm and c2.open_confirm, use the same file name
t.word().

However, it's strange that if I run RENEW3 by itself then it passes.
From the network trace, env.c2/c2.init_connection generates a SETCLIENTID
which might get rid off all the states of the previous client since
they're the same.

I will investigate why it fails when running 'all' with NFS4ERR_DELAY.

Do you know if there is an option to specify a list of tests to run,
instead of 'all'?

-Dai




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux