On 9/22/21 6:18 PM, J. Bruce Fields wrote:
On Wed, Sep 22, 2021 at 03:16:34PM -0700, dai.ngo@xxxxxxxxxx wrote:
On 9/22/21 2:14 PM, J. Bruce Fields wrote:
On Thu, Sep 16, 2021 at 02:22:11PM -0400, Dai Ngo wrote:
+/*
+ * If the conflict happens due to a NFSv4 request then check for
+ * courtesy client and set rq_conflict_client so that upper layer
+ * can destroy the conflict client and retry the call.
+ */
+static bool
+nfsd_check_courtesy_client(struct nfs4_delegation *dp)
+{
+ struct svc_rqst *rqst;
+ struct nfs4_client *clp = dp->dl_recall.cb_clp;
+ struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+ bool ret = false;
+
+ if (!i_am_nfsd()) {
+ if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags)) {
+ set_bit(NFSD4_DESTROY_COURTESY_CLIENT, &clp->cl_flags);
+ mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+ return true;
+ }
+ return false;
+ }
+ rqst = kthread_data(current);
+ if (rqst->rq_prog != NFS_PROGRAM || rqst->rq_vers < 4)
+ return false;
+ rqst->rq_conflict_client = NULL;
+
+ spin_lock(&nn->client_lock);
+ if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags) &&
+ !mark_client_expired_locked(clp)) {
+ rqst->rq_conflict_client = clp;
+ ret = true;
+ }
+ spin_unlock(&nn->client_lock);
Check whether this is safe; I think the flc_lock may be taken inside of
this lock elsewhere, resulting in a potential deadlock?
rqst doesn't need any locking as it's only being used by this thread, so
it's the client expiration stuff that's the problem, I guess.
mark_client_expired_locked needs to acquire cl_lock. I think the lock
ordering is ok, client_lock -> cl_lock. nfsd4_exchange_id uses this
lock ordering.
It's flc_lock (see locks.c) that I'm worried about. I've got a lockdep
warning here, taking a closer look....
nfsd4_release_lockowner takes clp->cl_lock and then fcl_lock.
Here we're taking fcl_lock and then client_lock.
As you say, elsewhere client_lock is taken and then cl_lock.
So that's the loop, I think.
Thanks Bruce, I see the deadlock. We will need a new approach for this.
-Dai
--b.