On Thu, 2023-10-19 at 11:59 -0400, Benjamin Coddington wrote: > When the client is required to use TEST_STATEID to discover which > delegation(s) have been revoked, it may continually test delegations at the > head of the list if the server continues to be unsatisfied and send > SEQ4_STATUS_RECALLABLE_STATE_REVOKED. For a large number of delegations > this behavior is prone to live-lock because the client may never be able to > test and free revoked state at the end of the list since the > SEQ4_STATUS_RECALLABLE_STATE_REVOKED will cause us to flag delegations at > the head of the list to be tested. This problem is further exacerbated by > the state manager's willingness to be scheduled out on a busy system while > testing the list of delegations. > > Keep a generation counter for each attempt to test all delegations, and > skip delegations that have already been tested in the current pass. > > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> > --- > fs/nfs/delegation.c | 7 ++++++- > fs/nfs/delegation.h | 1 + > include/linux/nfs_fs_sb.h | 1 + > 3 files changed, 8 insertions(+), 1 deletion(-) > > -- > > Changed on v2: remove extra brackets that had my debug statements > > diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c > index cf7365581031..fa1a14def45c 100644 > --- a/fs/nfs/delegation.c > +++ b/fs/nfs/delegation.c > @@ -448,6 +448,7 @@ int nfs_inode_set_delegation(struct inode *inode, const struct cred *cred, > delegation->cred = get_cred(cred); > delegation->inode = inode; > delegation->flags = 1<<NFS_DELEGATION_REFERENCED; > + delegation->test_gen = 0; > spin_lock_init(&delegation->lock); > > spin_lock(&clp->cl_lock); > @@ -1294,6 +1295,8 @@ static int nfs_server_reap_expired_delegations(struct nfs_server *server, > struct inode *inode; > const struct cred *cred; > nfs4_stateid stateid; > + unsigned long gen = ++server->delegation_gen; > + > restart: > rcu_read_lock(); > restart_locked: > @@ -1303,7 +1306,8 @@ static int nfs_server_reap_expired_delegations(struct nfs_server *server, > test_bit(NFS_DELEGATION_RETURNING, > &delegation->flags) || > test_bit(NFS_DELEGATION_TEST_EXPIRED, > - &delegation->flags) == 0) > + &delegation->flags) == 0 || > + delegation->test_gen == gen) > continue; > inode = nfs_delegation_grab_inode(delegation); > if (inode == NULL) > @@ -1312,6 +1316,7 @@ static int nfs_server_reap_expired_delegations(struct nfs_server *server, > cred = get_cred_rcu(delegation->cred); > nfs4_stateid_copy(&stateid, &delegation->stateid); > spin_unlock(&delegation->lock); > + delegation->test_gen = gen; > clear_bit(NFS_DELEGATION_TEST_EXPIRED, &delegation->flags); > rcu_read_unlock(); > nfs_delegation_test_free_expired(inode, &stateid, cred); > diff --git a/fs/nfs/delegation.h b/fs/nfs/delegation.h > index 1c378992b7c0..a6f495d012cf 100644 > --- a/fs/nfs/delegation.h > +++ b/fs/nfs/delegation.h > @@ -21,6 +21,7 @@ struct nfs_delegation { > fmode_t type; > unsigned long pagemod_limit; > __u64 change_attr; > + unsigned long test_gen; > unsigned long flags; > refcount_t refcount; > spinlock_t lock; > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h > index 20eeba8b009d..2f9d380b3439 100644 > --- a/include/linux/nfs_fs_sb.h > +++ b/include/linux/nfs_fs_sb.h > @@ -238,6 +238,7 @@ struct nfs_server { > struct list_head delegations; > struct list_head ss_copies; > > + unsigned long delegation_gen; > unsigned long mig_gen; > unsigned long mig_status; > #define NFS_MIG_IN_TRANSITION (1) Looks like a reasonable thing to do. Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>