On 10/01/2022 10:38, Wen Gu wrote: > We encountered a crash in smc_setsockopt() and it is caused by > accessing smc->clcsock after clcsock was released. > > BUG: kernel NULL pointer dereference, address: 0000000000000020 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E 5.16.0-rc4+ #53 > RIP: 0010:smc_setsockopt+0x59/0x280 [smc] > Call Trace: > <TASK> > __sys_setsockopt+0xfc/0x190 > __x64_sys_setsockopt+0x20/0x30 > do_syscall_64+0x34/0x90 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x7f16ba83918e > </TASK> > > This patch tries to fix it by holding clcsock_release_lock and > checking whether clcsock has already been released. In case that > a crash of the same reason happens in smc_getsockopt(), this patch > also checkes smc->clcsock in smc_getsockopt(). > > Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx> > --- > net/smc/af_smc.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 1c9289f..af423f4 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -2441,6 +2441,11 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, > /* generic setsockopts reaching us here always apply to the > * CLC socket > */ > + mutex_lock(&smc->clcsock_release_lock); > + if (!smc->clcsock) { > + mutex_unlock(&smc->clcsock_release_lock); > + return -EBADF; > + } > if (unlikely(!smc->clcsock->ops->setsockopt)) > rc = -EOPNOTSUPP; > else > @@ -2450,6 +2455,7 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, > sk->sk_err = smc->clcsock->sk->sk_err; > sk_error_report(sk); > } > + mutex_unlock(&smc->clcsock_release_lock); In the switch() the function smc_switch_to_fallback() might be called which also accesses smc->clcsock without further checking. This should also be protected then? Also from all callers of smc_switch_to_fallback() ? There are more uses of smc->clcsock (e.g. smc_bind(), ...), so why does this problem happen in setsockopt() for you only? I suspect it depends on the test case. I wonder if it makes sense to check and protect smc->clcsock at all places in the code where it is used... as of now we had no such races like you encountered. But I see that in theory this problem could also happen in other code areas. > > if (optlen < sizeof(int)) > return -EINVAL; > @@ -2509,13 +2515,21 @@ static int smc_getsockopt(struct socket *sock, int level, int optname, > char __user *optval, int __user *optlen) > { > struct smc_sock *smc; > + int rc; > > smc = smc_sk(sock->sk); > + mutex_lock(&smc->clcsock_release_lock); > + if (!smc->clcsock) { > + mutex_unlock(&smc->clcsock_release_lock); > + return -EBADF; > + } > /* socket options apply to the CLC socket */ > if (unlikely(!smc->clcsock->ops->getsockopt)) > return -EOPNOTSUPP; > - return smc->clcsock->ops->getsockopt(smc->clcsock, level, optname, > + rc = smc->clcsock->ops->getsockopt(smc->clcsock, level, optname, > optval, optlen); > + mutex_unlock(&smc->clcsock_release_lock); > + return rc; > } > > static int smc_ioctl(struct socket *sock, unsigned int cmd, -- Karsten