On Tue, 2017-08-01 at 10:20 -0700, Linus Torvalds wrote: > On Tue, Aug 1, 2017 at 8:51 AM, davej@xxxxxxxxxxxxxxxxx > <davej@xxxxxxxxxxxxxxxxx> wrote: > > On Mon, Jul 31, 2017 at 10:35:45PM -0700, Linus Torvalds wrote: > > > Any chance of getting the output from > > > > > > ./scripts/faddr2line vmlinux > > nfs4_exchange_id_done+0x3d7/0x8e0 > > > > > > Hm, that points to this.. > > > > 7463 /* Save the EXCHANGE_ID verifier session trunk > > tests */ > > 7464 memcpy(clp->cl_confirm.data, cdata- > > >args.verifier->data, > > 7465 sizeof(clp->cl_confirm.data)); > > Ok, that certainly made no sense to me, because the KASAN report made > it look like a stale pathname access (allocated in getname, freed in > putname), but I think the issue is more fundamental than that. > > That cdata->args.verifier seems to be entirely broken. AT least for > the "xprt == NULL" case, it does the following: > > - use the address of a local variable ("&verifier") > > - wait for the rpc completion using rpc_wait_for_completion_task(). > > That's unacceptably buggy crap. rpc_wait_for_completion_task() will > happily exit on a deadly signal even if the rpc hasn't been > completed, > so now you'll have a stale pointer to a stack that has been freed. > > So I think the 'pathname' part may actually be entirely a red > herring, > and it's the underlying access itself that just picks up a random > pointer from a stack that now contains something different. And KASAN > didn't notice the stale stack access itself, because the stack slot > is > still valid - it's just no longer the original 'verifier' allocation. > > Or *something* like that. > > None of this looks even remotely new, though - the code seems to go > back to 2009. Have you just changed what you're testing to trigger > these things? > > I'm not even sure why it does that stupid stack allocation. It does a > *real* allocation just a few lines later: > > struct nfs41_exchange_id_data *calldata > ... > calldata = kzalloc(sizeof(*calldata), GFP_NOFS); > > and the whole verifier structure could easily have been part of that > same allocation as far as I can tell. > > And that really might seem to be the right thing to do. > > TOTALLY UNTESTED PROBABLY COMPLETE CRAP patch attatched. > > That patch compiles for me. It *might* even work. Or it might just be > the ramblings of a diseased mind. > > Anna? Trond? > I came to the same conclusion yesterday, and have a stable patch that does something similar. I just got distracted with the other bugs that were introduced by the exchangeid patch series in Linux-4.9 (including what looks like a duplicate free issue in nfs4_test_session_trunk()). I can pass a few of the more critical patches on to Anna for merging in this cycle, then I've got some clean ups ready for the 4.14 merge window. Cheers Trond -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx