On Mon, 13 Jul 2020, Chuck Lever wrote: > > > > On Jul 13, 2020, at 8:29 AM, Scott Mayhew <smayhew@xxxxxxxxxx> wrote: > > > > On Sun, 12 Jul 2020, Chuck Lever wrote: > > > >> Hi Scott- > >> > >>> On Jul 10, 2020, at 4:33 PM, Scott Mayhew <smayhew@xxxxxxxxxx> wrote: > >>> > >>> If the rpc_pipefs is unmounted, then the rpc_pipe->dentry becomes NULL > >>> and dereferencing the dentry->d_sb will trigger an oops. The only > >>> reason we're doing that is to determine the nfsd_net, which could > >>> instead be passed in by the caller. So do that instead. > >>> > >>> Signed-off-by: Scott Mayhew <smayhew@xxxxxxxxxx> > >> > >> Looks straightforward. Applied to the nfsd-5.9 testing tree. > >> > >> I'm wondering about automatic backports to stable. This fix does not > >> apply to kernels before v5.6, but IIUC addresses a crash introduced > >> in f3f8014862d8 ("nfsd: add the infrastructure to handle the cld upcall") > >> [2012] ? > >> > >> Is "Cc: <stable@xxxxxxxxxx> # v5.6+" appropriate? > >> > > I think it would be 11a60d159259 ("nfsd: add a "GetVersion" upcall for > > nfsdcld"), so it would only need to go back to v5.4... and to get the > > patch to apply, in the hunk that modifies nfsd4_cld_grace_done_v0() > > you would need to add a cast of nn->boot_time to int64_t (which > > 9cc7680149b2 "nfsd: make 'boot_time' 64-bit wide" removed). > > Thanks for the detailed background! > > Then instead of "Cc: stable", can I add: > > Fixes: 11a60d159259 ("nfsd: add a "GetVersion" upcall for nfsdcld") ? Yes, please do that. > > > >> Also, is there a bug report that documents the crash? > > > > Our QE hit the crash internally while running xfstests on a pNFS SCSI > > setup. I don't think either of those is relevant aside from the fact > > that several nfsd threads where stuck on xfs while calling > > nfsd4_scsi_proc_layoutcommit(). I think what happened is that the job > > timed out and the system was in the process of being shut down when the > > oops occurred, and the rpc_pipefs was unmounted out from under nfsd: > > > > crash> bt > > PID: 39572 TASK: ffff8a78f67fddc0 CPU: 1 COMMAND: "kworker/u4:0" > > #0 [ffffa39a026b3ae0] machine_kexec at ffffffffae65b55e > > #1 [ffffa39a026b3b38] __crash_kexec at ffffffffae75ea5d > > #2 [ffffa39a026b3c00] crash_kexec at ffffffffae75f93d > > #3 [ffffa39a026b3c18] oops_end at ffffffffae622c4d > > #4 [ffffa39a026b3c38] no_context at ffffffffae66aa2e > > #5 [ffffa39a026b3c90] do_page_fault at ffffffffae66b552 > > #6 [ffffa39a026b3cc0] async_page_fault at ffffffffaf00123e > > [exception RIP: __cld_pipe_upcall+0x3d] > > RIP: ffffffffc06b5ded RSP: ffffa39a026b3d70 RFLAGS: 00010246 > > RAX: 0000000000000000 RBX: ffff8a78ceaf7000 RCX: 000000000000000b > > RDX: ffff8a78ceaf7000 RSI: ffffa39a026b3d70 RDI: ffffa39a026b3d70 > > RBP: ffffa39a026b3dc0 R8: ffff8a78f37aac50 R9: ffff8a78c7c02800 > > R10: 8080808080808080 R11: 0000ec193f7575c0 R12: ffff8a78c8ef3038 > > R13: ffff8a78d3156220 R14: ffffa39a026b3e50 R15: ffff8a78d3156220 > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > #7 [ffffa39a026b3dc8] nfsd4_cld_remove at ffffffffc06b7b70 [nfsd] > > #8 [ffffa39a026b3df8] expire_client at ffffffffc06aa4d6 [nfsd] > > #9 [ffffa39a026b3e08] laundromat_main at ffffffffc06aa799 [nfsd] > > #10 [ffffa39a026b3e98] process_one_work at ffffffffae6d1cf7 > > #11 [ffffa39a026b3ed8] worker_thread at ffffffffae6d23c0 > > #12 [ffffa39a026b3f10] kthread at ffffffffae6d7d02 > > #13 [ffffa39a026b3f50] ret_from_fork at ffffffffaf000255 > > > > crash> dis -lr __cld_pipe_upcall+0x3d > > ... > > /usr/src/debug/kernel-4.18.0-211.el8/linux-4.18.0-211.el8.x86_64/fs/nfsd/nfs4recover.c: 764 > > 0xffffffffc06b5de0 <__cld_pipe_upcall+0x30>: mov 0x108(%rdi),%rax > > 0xffffffffc06b5de7 <__cld_pipe_upcall+0x37>: mov %rsp,%rsi > > 0xffffffffc06b5dea <__cld_pipe_upcall+0x3a>: mov %rsi,%rdi > > 0xffffffffc06b5ded <__cld_pipe_upcall+0x3d>: mov 0x68(%rax),%rax > > > > That puts us here: > > > > static int > > __cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg) > > { > > int ret; > > struct rpc_pipe_msg msg; > > struct cld_upcall *cup = container_of(cmsg, struct cld_upcall, cu_u); > > struct nfsd_net *nn = net_generic(pipe->dentry->d_sb->s_fs_info, > > nfsd_net_id); > > ... > > crash> rpc_pipe.dentry -o > > struct rpc_pipe { > > [0x108] struct dentry *dentry; > > } > > crash> dentry.d_sb -o > > struct dentry { > > [0x68] struct super_block *d_sb; > > } > > > > If the rpc_pipe->dentry was NULL, that would explain the NULL pointer dereference at 0000000000000068. Let's confirm. > > > > crash> dis -lr ffffffffc06b7b70 > > ... > > /usr/src/debug/kernel-4.18.0-211.el8/linux-4.18.0-211.el8.x86_64/fs/nfsd/nfs4recover.c: 794 > > 0xffffffffc06b7b65 <nfsd4_cld_remove+0x85>: mov %rbp,%rsi > > 0xffffffffc06b7b68 <nfsd4_cld_remove+0x88>: mov %rbx,%rdi <------------------------------rpc_pipe > > 0xffffffffc06b7b6b <nfsd4_cld_remove+0x8b>: callq 0xffffffffc06b5db0 <__cld_pipe_upcall> > > /usr/src/debug/kernel-4.18.0-211.el8/linux-4.18.0-211.el8.x86_64/fs/nfsd/nfs4recover.c: 795 > > 0xffffffffc06b7b70 <nfsd4_cld_remove+0x90>: cmp $0xfffffff5,%eax > > > > crash> rpc_pipe.dentry ffff8a78ceaf7000 > > dentry = 0x0 > > > > We can also see that the rpc_pipefs isn't mounted: > > > > crash> mount|grep pipefs > > (nothing) > > > > And none of the daemons that require the rpc_pipefs are running: > > > > crash> ps|grep nfsdcld > > crash> ps|grep idmapd > > crash> ps|grep gssd > > crash> ps|grep blkmapd > > (nothing) > > > > I think part of the problem is that the nfsdcld.service unit file has > > "Requires=rpc_pipefs.target", but the nfs-server.service unit file only > > has "Wants=nfsdcld.service". It can't be "Requires" because using the > > nfsdcld daemon for client tracking is optional. > > > > -Scott > > > >> > >> > >>> --- > >>> fs/nfsd/nfs4recover.c | 24 +++++++++++------------- > >>> 1 file changed, 11 insertions(+), 13 deletions(-) > >>> > >>> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c > >>> index 9e40dfecf1b1..186fa2c2c6ba 100644 > >>> --- a/fs/nfsd/nfs4recover.c > >>> +++ b/fs/nfsd/nfs4recover.c > >>> @@ -747,13 +747,11 @@ struct cld_upcall { > >>> }; > >>> > >>> static int > >>> -__cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg) > >>> +__cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg, struct nfsd_net *nn) > >>> { > >>> int ret; > >>> struct rpc_pipe_msg msg; > >>> struct cld_upcall *cup = container_of(cmsg, struct cld_upcall, cu_u); > >>> - struct nfsd_net *nn = net_generic(pipe->dentry->d_sb->s_fs_info, > >>> - nfsd_net_id); > >>> > >>> memset(&msg, 0, sizeof(msg)); > >>> msg.data = cmsg; > >>> @@ -773,7 +771,7 @@ __cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg) > >>> } > >>> > >>> static int > >>> -cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg) > >>> +cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg, struct nfsd_net *nn) > >>> { > >>> int ret; > >>> > >>> @@ -782,7 +780,7 @@ cld_pipe_upcall(struct rpc_pipe *pipe, void *cmsg) > >>> * upcalls queued. > >>> */ > >>> do { > >>> - ret = __cld_pipe_upcall(pipe, cmsg); > >>> + ret = __cld_pipe_upcall(pipe, cmsg, nn); > >>> } while (ret == -EAGAIN); > >>> > >>> return ret; > >>> @@ -1115,7 +1113,7 @@ nfsd4_cld_create(struct nfs4_client *clp) > >>> memcpy(cup->cu_u.cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data, > >>> clp->cl_name.len); > >>> > >>> - ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg); > >>> + ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg, nn); > >>> if (!ret) { > >>> ret = cup->cu_u.cu_msg.cm_status; > >>> set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); > >>> @@ -1180,7 +1178,7 @@ nfsd4_cld_create_v2(struct nfs4_client *clp) > >>> } else > >>> cmsg->cm_u.cm_clntinfo.cc_princhash.cp_len = 0; > >>> > >>> - ret = cld_pipe_upcall(cn->cn_pipe, cmsg); > >>> + ret = cld_pipe_upcall(cn->cn_pipe, cmsg, nn); > >>> if (!ret) { > >>> ret = cmsg->cm_status; > >>> set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); > >>> @@ -1218,7 +1216,7 @@ nfsd4_cld_remove(struct nfs4_client *clp) > >>> memcpy(cup->cu_u.cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data, > >>> clp->cl_name.len); > >>> > >>> - ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg); > >>> + ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg, nn); > >>> if (!ret) { > >>> ret = cup->cu_u.cu_msg.cm_status; > >>> clear_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); > >>> @@ -1261,7 +1259,7 @@ nfsd4_cld_check_v0(struct nfs4_client *clp) > >>> memcpy(cup->cu_u.cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data, > >>> clp->cl_name.len); > >>> > >>> - ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg); > >>> + ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg, nn); > >>> if (!ret) { > >>> ret = cup->cu_u.cu_msg.cm_status; > >>> set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags); > >>> @@ -1404,7 +1402,7 @@ nfsd4_cld_grace_start(struct nfsd_net *nn) > >>> } > >>> > >>> cup->cu_u.cu_msg.cm_cmd = Cld_GraceStart; > >>> - ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg); > >>> + ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg, nn); > >>> if (!ret) > >>> ret = cup->cu_u.cu_msg.cm_status; > >>> > >>> @@ -1432,7 +1430,7 @@ nfsd4_cld_grace_done_v0(struct nfsd_net *nn) > >>> > >>> cup->cu_u.cu_msg.cm_cmd = Cld_GraceDone; > >>> cup->cu_u.cu_msg.cm_u.cm_gracetime = nn->boot_time; > >>> - ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg); > >>> + ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg, nn); > >>> if (!ret) > >>> ret = cup->cu_u.cu_msg.cm_status; > >>> > >>> @@ -1460,7 +1458,7 @@ nfsd4_cld_grace_done(struct nfsd_net *nn) > >>> } > >>> > >>> cup->cu_u.cu_msg.cm_cmd = Cld_GraceDone; > >>> - ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg); > >>> + ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg, nn); > >>> if (!ret) > >>> ret = cup->cu_u.cu_msg.cm_status; > >>> > >>> @@ -1524,7 +1522,7 @@ nfsd4_cld_get_version(struct nfsd_net *nn) > >>> goto out_err; > >>> } > >>> cup->cu_u.cu_msg.cm_cmd = Cld_GetVersion; > >>> - ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg); > >>> + ret = cld_pipe_upcall(cn->cn_pipe, &cup->cu_u.cu_msg, nn); > >>> if (!ret) { > >>> ret = cup->cu_u.cu_msg.cm_status; > >>> if (ret) > >>> -- > >>> 2.25.4 > >>> > >> > >> -- > >> Chuck Lever > > -- > Chuck Lever > > >