On Tue, Aug 1, 2017 at 8:51 AM, davej@xxxxxxxxxxxxxxxxx <davej@xxxxxxxxxxxxxxxxx> wrote: > On Mon, Jul 31, 2017 at 10:35:45PM -0700, Linus Torvalds wrote: > > Any chance of getting the output from > > > > ./scripts/faddr2line vmlinux nfs4_exchange_id_done+0x3d7/0x8e0 > > > Hm, that points to this.. > > 7463 /* Save the EXCHANGE_ID verifier session trunk tests */ > 7464 memcpy(clp->cl_confirm.data, cdata->args.verifier->data, > 7465 sizeof(clp->cl_confirm.data)); Ok, that certainly made no sense to me, because the KASAN report made it look like a stale pathname access (allocated in getname, freed in putname), but I think the issue is more fundamental than that. That cdata->args.verifier seems to be entirely broken. AT least for the "xprt == NULL" case, it does the following: - use the address of a local variable ("&verifier") - wait for the rpc completion using rpc_wait_for_completion_task(). That's unacceptably buggy crap. rpc_wait_for_completion_task() will happily exit on a deadly signal even if the rpc hasn't been completed, so now you'll have a stale pointer to a stack that has been freed. So I think the 'pathname' part may actually be entirely a red herring, and it's the underlying access itself that just picks up a random pointer from a stack that now contains something different. And KASAN didn't notice the stale stack access itself, because the stack slot is still valid - it's just no longer the original 'verifier' allocation. Or *something* like that. None of this looks even remotely new, though - the code seems to go back to 2009. Have you just changed what you're testing to trigger these things? I'm not even sure why it does that stupid stack allocation. It does a *real* allocation just a few lines later: struct nfs41_exchange_id_data *calldata ... calldata = kzalloc(sizeof(*calldata), GFP_NOFS); and the whole verifier structure could easily have been part of that same allocation as far as I can tell. And that really might seem to be the right thing to do. TOTALLY UNTESTED PROBABLY COMPLETE CRAP patch attatched. That patch compiles for me. It *might* even work. Or it might just be the ramblings of a diseased mind. Anna? Trond? So caveat probatorem, Linus
fs/nfs/nfs4proc.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 18ca6879d8de..0712af3d38f8 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -7490,6 +7490,11 @@ static const struct rpc_call_ops nfs4_exchange_id_call_ops = { .rpc_release = nfs4_exchange_id_release, }; +struct verifier_and_calldata { + struct nfs41_exchange_id_data calldata; + nfs4_verifier verifier; +}; + /* * _nfs4_proc_exchange_id() * @@ -7498,7 +7503,8 @@ static const struct rpc_call_ops nfs4_exchange_id_call_ops = { static int _nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred, u32 sp4_how, struct rpc_xprt *xprt) { - nfs4_verifier verifier; + struct verifier_and_calldata *vna; + nfs4_verifier *verifier; struct rpc_message msg = { .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_EXCHANGE_ID], .rpc_cred = cred, @@ -7516,14 +7522,17 @@ static int _nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred, if (!atomic_inc_not_zero(&clp->cl_count)) return -EIO; - calldata = kzalloc(sizeof(*calldata), GFP_NOFS); - if (!calldata) { + vna = kzalloc(sizeof(*vna), GFP_NOFS); + if (!vna) { nfs_put_client(clp); return -ENOMEM; } + /* kfree() of calldata will also free the verifier */ + calldata = &vna->calldata; + verifier = &vna->verifier; if (!xprt) - nfs4_init_boot_verifier(clp, &verifier); + nfs4_init_boot_verifier(clp, verifier); status = nfs4_init_uniform_client_string(clp); if (status) @@ -7566,7 +7575,7 @@ static int _nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred, RPC_TASK_SOFT|RPC_TASK_SOFTCONN|RPC_TASK_ASYNC; calldata->args.verifier = &clp->cl_confirm; } else { - calldata->args.verifier = &verifier; + calldata->args.verifier = verifier; } calldata->args.client = clp; #ifdef CONFIG_NFS_V4_1_MIGRATION