> > > >> This is a rather unusual problem that can come up when fallback=true BEFORE smc_connect() > > > >> is called. But nevertheless, it is a problem. > > > >> > > > >> Right now I am not sure if it is okay when we NOT hold a ref to smc->sk during all fallback > > > >> processing. This change also conflicts with a patch that is already on net-next (3aba1030). > > > > > > > > Do you mean put the ref to smc->sk during all fallback processing unconditionally and remove > > > > the fallback branch sock_put() in __smc_release()? > > > > > > What I had in mind was to eventually call sock_put() in __smc_release() even if sk->sk_state == SMC_INIT > > > (currently the extra check in the if() for sk->sk_state != SMC_INIT prevents the sock_put()), but only > > > when it is sure that we actually reached the sock_hold() in smc_connect() before. > > > > > > But maybe we find out that the sock_hold() is not needed for fallback sockets, I don't know... > > > > I do think the sock_hold()/sock_put() for smc->sk is a bit complicated, Emm, I'm not sure if it > > can be simplified.. > > > > In fact, I'm sure there must be another ref count issue in my environment,but I haven't caught it yet. > > I am wondering the issue of this ref count. If it is convenient, would > you like to provide some more details? > > syzkaller has reported some issues about ref count, but syzkaller and > others' bot don't have RDMA devices, they cannot cover most of the code > routines in SMC. We are working on it to provide SMC fuzz test with RDMA > environment. So it's very nice to have real world issues. > > Thanks, > Tony Lu I have encountered two types of problems. However, I cannot reproduce it stably. case 1. After closing the app (>> TIME_WAIT), 'lsmod' shows that the smc module ref count is still greater than 0. case 2 [rare]. 'lsmod' shows smc module ref count is less than 0. Some clues of case 2 are as follows: kernel: [67166.688386] ------------[ cut here ]------------ kernel: [67166.693658] cache_from_obj: Wrong slab cache. SMC but object is from SMC kernel: [67166.701136] WARNING: CPU: 47 PID: 176961 at mm/slab.h:469 kmem_cache_free+0x329/0x410 ...... kernel: [67166.846819] CPU: 47 PID: 176961 Comm: redis-server Kdump: loaded Tainted: G R B OE 5.10.0-0.bpo.9-amd64 #1 Debian 5.10.70-1~bpo10+1 kernel: [67166.860915] Hardware name: Inspur SA5280M6/SA5280M6, BIOS 06.00.01 10/09/2021 kernel: [67166.868747] RIP: 0010:kmem_cache_free+0x329/0x410 kernel: [67166.874168] Code: ff 0f 0b 48 8d b8 f0 9d 02 00 e9 e4 fe ff ff 48 8b 57 60 49 8b 4f 60 48 c7 c6 30 86 63 a4 48 c7 c7 f8 e6 8f a4 e8 89 63 5c 00 <0f> 0b 48 89 de 4c 89 ff e8 1a ad ff ff 48 8b 0d 63 34 ef 00 e9 49 kernel: [67166.894360] RSP: 0018:ffffbd450f527e18 EFLAGS: 00010286 kernel: [67166.900306] RAX: 0000000000000000 RBX: ffffa00fa4548d00 RCX: 0000000000000000 kernel: [67166.908169] RDX: ffffa04c7f7e8760 RSI: ffffa04c7f7d8a00 RDI: ffffa04c7f7d8a00 kernel: [67166.916027] RBP: ffffa01024548d00 R08: 0000000000000000 R09: c0000000ffffbfff kernel: [67166.923860] R10: 0000000000000001 R11: ffffbd450f527c20 R12: 0000000000000000 kernel: [67166.931713] R13: 0000000000000000 R14: ffffa00fa4548f28 R15: ffffa02d3366bf00 kernel: [67166.939564] FS: 00007fe131c80f40(0000) GS:ffffa04c7f7c0000(0000) knlGS:0000000000000000 kernel: [67166.948361] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: [67166.954817] CR2: 00007fe12f477000 CR3: 00000004874be003 CR4: 0000000000770ee0 kernel: [67166.962662] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: [67166.970498] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: [67166.978306] PKRU: 55555554 kernel: [67166.981695] Call Trace: kernel: [67166.985017] __sk_destruct+0x12c/0x1e0 kernel: [67166.989449] smc_release+0x19a/0x230 [smc] kernel: [67166.994325] __sock_release+0x3d/0xa0 kernel: [67166.998656] sock_close+0x11/0x20 kernel: [67167.002637] __fput+0x93/0x240 kernel: [67167.006347] task_work_run+0x76/0xb0 kernel: [67167.010569] exit_to_user_mode_prepare+0x129/0x130 kernel: [67167.016000] syscall_exit_to_user_mode+0x28/0x140 kernel: [67167.021339] entry_SYSCALL_64_after_hwframe+0x44/0xa9