On Tue, 2017-11-07 at 13:26 -0500, Scott Mayhew wrote: > On Tue, 07 Nov 2017, Trond Myklebust wrote: > > > On Tue, 2017-11-07 at 09:29 -0500, Scott Mayhew wrote: > > > The following deadlock can occur between a process waiting for a > > > client > > > to initialize in while walking the client list and another > > > process > > > waiting for the nfs_clid_init_mutex so it can initialize that > > > client: > > > > > > Process 1 Process 2 > > > --------- --------- > > > spin_lock(&nn->nfs_client_lock); > > > list_add_tail(&CLIENTA->cl_share_link, > > > &nn->nfs_client_list); > > > spin_unlock(&nn->nfs_client_lock); > > > spin_lock(&nn- > > > > nfs_client_lock); > > > > > > list_add_tail(&CLIENTB- > > > > cl_share_link, > > > > > > &nn- > > > > nfs_client_list); > > > > > > spin_unlock(&nn- > > > > nfs_client_lock); > > > > > > mutex_lock(&nfs_clid_init > > > _mut > > > ex); > > > nfs41_walk_client_list(cl > > > p, > > > result, cred); > > > nfs_wait_client_init_comp > > > lete > > > (CLIENTA); > > > (waiting for nfs_clid_init_mutex) > > > > > > Make sure nfs_match_client() only evaluates clients that have > > > completed > > > initialization in order to prevent that deadlock. > > > > > > Signed-off-by: Scott Mayhew <smayhew@xxxxxxxxxx> > > > --- > > > fs/nfs/client.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c > > > index 22880ef..8b093994 100644 > > > --- a/fs/nfs/client.c > > > +++ b/fs/nfs/client.c > > > @@ -291,12 +291,21 @@ static struct nfs_client > > > *nfs_match_client(const struct nfs_client_initdata *dat > > > const struct sockaddr *sap = data->addr; > > > struct nfs_net *nn = net_generic(data->net, nfs_net_id); > > > > > > +again: > > > list_for_each_entry(clp, &nn->nfs_client_list, > > > cl_share_link) { > > > const struct sockaddr *clap = (struct sockaddr > > > *)&clp->cl_addr; > > > /* Don't match clients that failed to initialise > > > properly */ > > > if (clp->cl_cons_state < 0) > > > continue; > > > > > > + if (clp->cl_minorversion > 0 && > > > + clp->cl_cons_state > > > > NFS_CS_READY) { > > > + spin_unlock(&nn->nfs_client_lock); > > > + nfs_wait_client_init_complete(clp); > > > + spin_lock(&nn->nfs_client_lock); > > > + goto again; > > > + } > > > + > > > /* Different NFS versions cannot share the same > > > nfs_client */ > > > if (clp->rpc_ops != data->nfs_mod->rpc_ops) > > > continue; > > > > Why the test for clp->cl_minorversion? What's so minor version > > specific > > about any of this? > > The deadlock doesn't occur with v4.0 clients because those are being > marked NFS_CS_READY in nfs4_client_client(), before the trunking > detection Sure, but the root cause you are asserting is that the nfs_client has not finished initialising. What is minorversion-specific (or even NFSv4-specific) about that? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥