If an NFS client has 10,000 delegations which are between 90 and 180 seconds old, and 10,000 which are between 180 and 270 seconds old, with none of them still in use, it is likely that the old ones are at the end of the list. The first 10,000 will not be marked to be returned, the last 10,000 will. To return these, the code starts at the front of the list and finds the first delegation needing to be returned (testing 10,000 on the way). Then it drops rcu_readlock(), returns the delegation and starts again, and does this 10,000 times. As delegation return is async, it may never block, so these 10,000 delegation will be returned without stopping for a breath. The soft-lock detector will notice and complain. This patch makes 3 changes. 1/ cond_resched() is added so that the soft-lockup detector doesn't notice. 2/ A place-holder (an inode) is kept when locks are dropped, so that the place can usually be found again after taking rcu_readlock(). This means we don't need to skip over 10,000 entries 10,000 times, 100 million pointless operations - which could eaisly be a larger number. 3/ If nfs_sb_active() fails, break out of the loop - there is no point in continuing. The patch also add list_for_each_entry_from_rcu() to rculist.h in order to achieve 2/. Signed-off-by: NeilBrown <neilb@xxxxxxxx> --- Hi, I'm hoping one of the RCU reviewers will provide an Acked-by for the rculist.h change. If you'ld like 3 patches instead of just one, please let me know. But they all see to fit well together. thanks, NeilBrown fs/nfs/delegation.c | 57 ++++++++++++++++++++++++++++++++++++++++++++----- include/linux/rculist.h | 10 +++++++++ 2 files changed, 62 insertions(+), 5 deletions(-) diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 1819d0d0ba4b..c3d9e21ab440 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -483,19 +483,56 @@ static bool nfs_delegation_need_return(struct nfs_delegation *delegation) int nfs_client_return_marked_delegations(struct nfs_client *clp) { struct nfs_delegation *delegation; + struct nfs_delegation *prev; struct nfs_server *server; struct inode *inode; + struct inode *place_holder = NULL; int err = 0; restart: + /* + * To avoid quadratic looping we hold an reference + * to an inode place_holder. Each time we restart, we + * list nfs_servers from the server of that inode, and + * delegation in the server from the delegations of that + * inode. + * prev is an RCU-protected pointer to a delegation which + * wasn't marked for return and might be a good choice for + * the next place_holder. + */ rcu_read_lock(); - list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) { - list_for_each_entry_rcu(delegation, &server->delegations, - super_list) { - if (!nfs_delegation_need_return(delegation)) + prev = NULL; + if (place_holder) + server = NFS_SERVER(place_holder); + else + server = list_entry_rcu(clp->cl_superblocks.next, + struct nfs_server, client_link); + list_for_each_entry_from_rcu(server, &clp->cl_superblocks, client_link) { + delegation = NULL; + if (place_holder && server == NFS_SERVER(place_holder)) + delegation = rcu_dereference(NFS_I(place_holder)->delegation); + if (!delegation) + delegation = list_entry_rcu(server->delegations.next, + struct nfs_delegation, super_list); + list_for_each_entry_from_rcu(delegation, &server->delegations, super_list) { + struct inode *to_put = NULL; + + if (!nfs_delegation_need_return(delegation)) { + prev = delegation; continue; + } if (!nfs_sb_active(server->super)) - continue; + break; + + if (prev) { + struct inode *tmp; + tmp = nfs_delegation_grab_inode(prev); + if (tmp) { + to_put = place_holder; + place_holder = tmp; + } + } + inode = nfs_delegation_grab_inode(delegation); if (inode == NULL) { rcu_read_unlock(); @@ -505,16 +542,26 @@ int nfs_client_return_marked_delegations(struct nfs_client *clp) delegation = nfs_start_delegation_return_locked(NFS_I(inode)); rcu_read_unlock(); + if (to_put) { + iput(to_put); + to_put = NULL; + } + err = nfs_end_delegation_return(inode, delegation, 0); iput(inode); nfs_sb_deactive(server->super); + cond_resched(); if (!err) goto restart; set_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state); + if (place_holder) + iput(place_holder); return err; } } rcu_read_unlock(); + if (place_holder) + iput(place_holder); return 0; } diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 127f534fec94..2d86f9869842 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -403,6 +403,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, &pos->member != (head); \ pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) +/** + * list_for_each_entry_from_rcu - iterate over a list continuing from current point + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member: the name of the list_node within the struct. + */ +#define list_for_each_entry_from_rcu(pos, head, member) \ + for (; &(pos)->member != (head); \ + pos = list_entry_rcu(pos->member.next, typeof(*(pos)), member)) + /** * hlist_del_rcu - deletes entry from hash list without re-initialization * @n: the element to delete from the hash list. -- 2.14.0.rc0.dirty
Attachment:
signature.asc
Description: PGP signature