On Tue, Feb 12, 2019 at 03:37:20PM +0100, Greg Kroah-Hartman wrote: > On Tue, Feb 12, 2019 at 02:31:14PM +0000, David Howells wrote: > > I've bisected an oops that occurs in rpc_clnt_debugfs_register() trying to > > dereference a pointer with -EACCES in it. This is the causing commit, though > > I suspect the bug is in sunrpc expecting to see NULL rather than an error. > > > > ff9fb72bc07705c00795ca48631f7fffe24d2c6b is the first bad commit > > commit ff9fb72bc07705c00795ca48631f7fffe24d2c6b > > Author: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > Date: Wed Jan 23 11:28:14 2019 +0100 > > > > debugfs: return error values, not NULL > > > > When an error happens, debugfs should return an error pointer value, not > > NULL. This will prevent the totally theoretical error where a debugfs > > call fails due to lack of memory, returning NULL, and that dentry value > > is then passed to another debugfs call, which would end up succeeding, > > creating a file at the root of the debugfs tree, but would then be > > impossible to remove (because you can not remove the directory NULL). > > > > So, to make everyone happy, always return errors, this makes the users > > of debugfs much simpler (they do not have to ever check the return > > value), and everyone can rest easy. > > ... > > > > The attached oops occurs during boot from the gssproxy process in > > rpc_clnt_debugfs_register(). The code at this point is: > > > > 0xffffffff8195cbdd <+450>: mov 0x50(%rax),%rcx <--- oopsing > > 0xffffffff8195cbe1 <+454>: mov $0xffffffff821cc8ba,%rdx > > 0xffffffff8195cbe8 <+461>: mov $0x18,%esi > > 0xffffffff8195cbed <+466>: lea -0x30(%rbp),%rdi > > 0xffffffff8195cbf1 <+470>: callq 0xffffffff819db773 <snprintf> > > > > RAX is -EACCES. > > > > Looking in the source: > > > > len = snprintf(name, sizeof(name), "../../rpc_xprt/%s", > > xprt->debugfs->d_name.name); > > > > I think xprt->debugfs is the value in RAX. > > > > (gdb) p &((struct dentry *)0)->d_name.name > > $5 = (const unsigned char **) 0x50 <irq_stack_union+80> > > > > which matches the offset on the oopsing MOV instruction. > > > > This is with linus/master (aa0c38cf39de73bf7360a3da8f1707601261e518). > > Ugh, yeah, I see the problem, sorry about that. > > I wonder why the debugfs call is always failing, that's not good... > > let me dig and see if I already have a patch for this... I have a much larger cleanup patch for this code, but this single line change should solve the issue for now. Can you test it to verify? thanks, greg k-h ------------------ diff --git a/net/sunrpc/debugfs.c b/net/sunrpc/debugfs.c index 45a033329cd4..19bb356230ed 100644 --- a/net/sunrpc/debugfs.c +++ b/net/sunrpc/debugfs.c @@ -146,7 +146,7 @@ rpc_clnt_debugfs_register(struct rpc_clnt *clnt) rcu_read_lock(); xprt = rcu_dereference(clnt->cl_xprt); /* no "debugfs" dentry? Don't bother with the symlink. */ - if (!xprt->debugfs) { + if (IS_ERR_OR_NULL(xprt->debugfs)) { rcu_read_unlock(); return; }