Re: [PATCH net] net/smc: Transitional solution for clcsock race issue

Wen Gu <guwen@xxxxxxxxxxxxxxxxx> · Fri, 21 Jan 2022 15:05:02 +0800

On 2022/1/13 11:02 pm, Wen Gu wrote:
We encountered a crash in smc_setsockopt() and it is caused by
accessing smc->clcsock after clcsock was released.

  BUG: kernel NULL pointer dereference, address: 0000000000000020
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E     5.16.0-rc4+ #53
  RIP: 0010:smc_setsockopt+0x59/0x280 [smc]
  Call Trace:
   <TASK>
   __sys_setsockopt+0xfc/0x190
   __x64_sys_setsockopt+0x20/0x30
   do_syscall_64+0x34/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f16ba83918e
   </TASK>

This patch tries to fix it by holding clcsock_release_lock and
checking whether clcsock has already been released before access.

In case that a crash of the same reason happens in smc_getsockopt()
or smc_switch_to_fallback(), this patch also checkes smc->clcsock
in them too. And the caller of smc_switch_to_fallback() will identify
whether fallback succeeds according to the return value.

Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>
---
  net/smc/af_smc.c | 63 +++++++++++++++++++++++++++++++++++++++++++++-----------
  1 file changed, 51 insertions(+), 12 deletions(-)

Sorry for bothering, just wonder if this patch needs further improvements?

The previous discussion can be found in:
https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@xxxxxxxxxxxxx/T/

I sent this patch with a new subject instead of sending a v2 of the previously
discussed patch because I think the original subject seems not appropriate anymore
after introducing check of clcsock in smc_switch_to_fallback().

Thanks,
Wen Gu