Re: [PATCH RFC 1/1] nfsd: Initial implementation of NFSv4 Courteous Server

dai.ngo@xxxxxxxxxx · Wed, 30 Jun 2021 01:41:32 -0700





On 6/29/21 6:35 PM, J. Bruce Fields wrote:
On Mon, Jun 28, 2021 at 09:40:56PM -0700, dai.ngo@xxxxxxxxxx wrote:
On 6/28/21 4:39 PM, dai.ngo@xxxxxxxxxx wrote:
On 6/28/21 1:23 PM, J. Bruce Fields wrote:
On Thu, Jun 03, 2021 at 02:14:38PM -0400, Dai Ngo wrote:
@@ -6875,7 +6947,12 @@ nfsd4_lock(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate,
       case -EAGAIN:        /* conflock holds conflicting lock */
           status = nfserr_denied;
           dprintk("NFSD: nfsd4_lock: conflicting lock found!\n");
-        nfs4_set_lock_denied(conflock, &lock->lk_denied);
+
+        /* try again if conflict with courtesy client  */
+        if (nfs4_set_lock_denied(conflock, &lock->lk_denied)
== -EAGAIN && !retried) {
+            retried = true;
+            goto again;
+        }
Ugh, apologies, this was my idea, but I just noticed it only
handles conflicts
>from other NFSv4 clients.  The conflicting lock could just as
well come from
NLM or a local process.  So we need cooperation from the common
locks.c code.

I'm not sure what to suggest....
One option is to use locks_copy_conflock/nfsd4_fl_get_owner to detect
the lock being copied belongs to a courtesy client and schedule the
laundromat to run to destroy the courtesy client. This option requires
callers of vfs_lock_file to provide the 'conflock' argument.
I'm not sure I follow.  What's the advantage of doing it this way?

I'm not sure it's an advantage but I was trying to minimize changes to
the fs code. The only change we need is to add the conflock argument
to do_lock_file_wait to handle local lock conflicts.

If you don't think we're going to get objection with the new callback,
fl_expire_lock, then I will take that approach. We still need to add
the conflock argument to do_lock_file_wait in this case.


Regarding local lock conflick, do_lock_file_wait calls vfs_lock_file and
just block waiting for the lock to be released. Both of the options
above do not handle the case where the local lock happens before the
v4 client expires and becomes courtesy client. In this case we can not
let the v4 client becomes courtesy client.
Oh, good point, yes, we don't want that waiter stuck waiting forever on
this expired client....

We need to have a way to
detect that someone is blocked on a lock owned by the v4 client and
do not allow that client to become courtesy client.  One way to handle
this to mark the v4 lock as 'has_waiter', and then before allowing
the expired v4 client to become courtesy client we need to search
all the locks of this v4 client for any lock with 'has_waiter' flag
and disallow it. The part that I don't like about this approach is
having to search all locks of each lockowner of the v4 client for
lock with 'has_waiter'.  I need some suggestions here.
I'm not seeing a way to do it without iterating over all the client's
locks.

ok, i feel a bit better :-)


I don't think you should need a new flag, though, shouldn't
!list_empty(&lock->fl_blocked_requests) be enough?

Thanks Bruce, this is what I was looking for.

-Dai


--b.

-Dai

Maybe something like:

@@ -1159,6 +1159,7 @@ static int posix_lock_inode(struct inode
*inode, struct file_lock *request,
          }
            percpu_down_read(&file_rwsem);
+retry:
          spin_lock(&ctx->flc_lock);
          /*
           * New lock request. Walk all POSIX locks and look for
conflicts. If
@@ -1169,6 +1170,11 @@ static int posix_lock_inode(struct inode
*inode, struct file_lock *request,
                  list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
                          if (!posix_locks_conflict(request, fl))
                                  continue;
+                       if (fl->fl_lops->fl_expire_lock(fl, 1)) {
+ spin_unlock(&ctx->flc_lock);
+ fl->fl_lops->fl_expire_locks(fl, 0);
+                               goto retry;
+                       }
                          if (conflock)
                                  locks_copy_conflock(conflock, fl);
                          error = -EAGAIN;


where ->fl_expire_lock is a new lock callback with second
argument "check"
where:

     check = 1 means: just check whether this lock could be freed
     check = 0 means: go ahead and free this lock if you can
Thanks Bruce, I will look into this approach.

-Dai

--b.