Re: [PATCH v3 2/3] nfsd: Initial implementation of NFSv4 Courteous Server

dai.ngo@xxxxxxxxxx · Thu, 23 Sep 2021 10:09:07 -0700

On 9/22/21 6:18 PM, J. Bruce Fields wrote:
On Wed, Sep 22, 2021 at 03:16:34PM -0700, dai.ngo@xxxxxxxxxx wrote:
On 9/22/21 2:14 PM, J. Bruce Fields wrote:
On Thu, Sep 16, 2021 at 02:22:11PM -0400, Dai Ngo wrote:
+/*
+ * If the conflict happens due to a NFSv4 request then check for
+ * courtesy client and set rq_conflict_client so that upper layer
+ * can destroy the conflict client and retry the call.
+ */
+static bool
+nfsd_check_courtesy_client(struct nfs4_delegation *dp)
+{
+	struct svc_rqst *rqst;
+	struct nfs4_client *clp = dp->dl_recall.cb_clp;
+	struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+	bool ret = false;
+
+	if (!i_am_nfsd()) {
+		if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags)) {
+			set_bit(NFSD4_DESTROY_COURTESY_CLIENT, &clp->cl_flags);
+			mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+			return true;
+		}
+		return false;
+	}
+	rqst = kthread_data(current);
+	if (rqst->rq_prog != NFS_PROGRAM || rqst->rq_vers < 4)
+		return false;
+	rqst->rq_conflict_client = NULL;
+
+	spin_lock(&nn->client_lock);
+	if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags) &&
+				!mark_client_expired_locked(clp)) {
+		rqst->rq_conflict_client = clp;
+		ret = true;
+	}
+	spin_unlock(&nn->client_lock);
Check whether this is safe; I think the flc_lock may be taken inside of
this lock elsewhere, resulting in a potential deadlock?

rqst doesn't need any locking as it's only being used by this thread, so
it's the client expiration stuff that's the problem, I guess.
mark_client_expired_locked needs to acquire cl_lock. I think the lock
ordering is ok, client_lock -> cl_lock. nfsd4_exchange_id uses this
lock ordering.
It's flc_lock (see locks.c) that I'm worried about.  I've got a lockdep
warning here, taking a closer look....

nfsd4_release_lockowner takes clp->cl_lock and then fcl_lock.

Here we're taking fcl_lock and then client_lock.

As you say, elsewhere client_lock is taken and then cl_lock.

So that's the loop, I think.

Thanks Bruce, I see the deadlock. We will need a new approach for this.

-Dai

--b.