On 2022/1/13 11:02 pm, Wen Gu wrote:
We encountered a crash in smc_setsockopt() and it is caused by accessing smc->clcsock after clcsock was released. BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E 5.16.0-rc4+ #53 RIP: 0010:smc_setsockopt+0x59/0x280 [smc] Call Trace: <TASK> __sys_setsockopt+0xfc/0x190 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f16ba83918e </TASK> This patch tries to fix it by holding clcsock_release_lock and checking whether clcsock has already been released before access. In case that a crash of the same reason happens in smc_getsockopt() or smc_switch_to_fallback(), this patch also checkes smc->clcsock in them too. And the caller of smc_switch_to_fallback() will identify whether fallback succeeds according to the return value. Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx> --- net/smc/af_smc.c | 63 +++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 51 insertions(+), 12 deletions(-)
Sorry for bothering, just wonder if this patch needs further improvements? The previous discussion can be found in: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@xxxxxxxxxxxxx/T/ I sent this patch with a new subject instead of sending a v2 of the previously discussed patch because I think the original subject seems not appropriate anymore after introducing check of clcsock in smc_switch_to_fallback(). Thanks, Wen Gu