On 21/12/2021 09:04, Tony Lu wrote: > Hello, > > During developing and testing of SMC (net/smc), We found a problem, > when SMC released linkgroup or link, it called ib_dereg_mr to release > resources, then it panicked in mlx5_ib_dereg_mr. After investigation, > we found this panic was introduce by this commit: > > f0ae4afe3d35 ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow") +1, this panic in our environment: [ 380.055202] smc: SMC-R lg 00000200 link removed: id 00000201, peerid 00000101, ibdev mlx5_0, ibport 1 [ 380.055230] smc: SMC-R lg 00000100 state changed: SINGLE, pnetid NET10 [ 380.055605] Unable to handle kernel pointer dereference in virtual kernel address space [ 380.055607] Failing address: 7563745f64657000 TEID: 7563745f64657803 [ 380.055609] Fault in home space mode while using kernel ASCE. [ 380.055613] AS:0000000124abc007 R3:0000000000000024 [ 380.055650] Oops: 0038 ilc:3 [#1] SMP [ 380.055655] Modules linked in: dummy smc_diag smc tcp_diag ... [ 380.055698] CPU: 2 PID: 21939 Comm: kworker/2:22 Not tainted 5.16.0-20211220.rc5.git0.c4a510cd6ab8.300.fc35.s390x #1 [ 380.055700] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) [ 380.055702] Workqueue: events smc_link_down_work [smc] [ 380.055717] Krnl PSW : 0704e00180000000 000000012311abbc (dma_unmap_sg_attrs+0x1c/0x68) [ 380.055729] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 380.055732] Krnl GPRS: 0000000000000018 000000012311aba0 7563745f64657461 000000010232f003 [ 380.055735] 0000000002330003 0000000000000000 0000000000000000 0000000000000000 [ 380.055738] 0000000000000000 000000008fe64000 0000000084cd6000 000000008fe64000 [ 380.055740] 0000000035244200 00000000b669c248 000003800a077a68 000003800a077a10 [ 380.055748] Krnl Code: 000000012311abac: b90400ef lgr %r14,%r15 000000012311abb0: e3f0ffa8ff71 lay %r15,-88(%r15) #000000012311abb6: e3e0f0980024 stg %r14,152(%r15) >000000012311abbc: e3b021300002 ltg %r11,304(%r2) 000000012311abc2: a7840013 brc 8,000000012311abe8 000000012311abc6: ec52001d027f clij %r5,2,2,000000012311ac00 000000012311abcc: e310b0580002 ltg %r1,88(%r11) 000000012311abd2: a7840005 brc 8,000000012311abdc [ 380.055775] Call Trace: [ 380.055777] [<000000012311abbc>] dma_unmap_sg_attrs+0x1c/0x68 [ 380.055780] [<000003ff80560bd2>] __ib_umem_release+0xc2/0xd8 [ib_uverbs] [ 380.055797] [<000003ff805610a6>] ib_umem_release+0x4e/0xe0 [ib_uverbs] [ 380.055806] [<000003ff804fe7ca>] mlx5_ib_dereg_mr.localalias+0x212/0x480 [mlx5_ib] [ 380.055830] [<000003ff803a0ddc>] ib_dereg_mr_user+0x5c/0xe0 [ib_core] [ 380.055878] [<000003ff806c249c>] smcr_buf_unmap_link+0x64/0xe0 [smc] [ 380.055887] [<000003ff806c2cb2>] smcr_link_clear.part.0+0x72/0x230 [smc] [ 380.055896] [<000003ff806c6364>] smcr_link_down+0xc4/0x1b8 [smc] [ 380.055902] [<000003ff806c64be>] smc_link_down_work+0x66/0x88 [smc] [ 380.055909] [<00000001230a2b02>] process_one_work+0x1fa/0x470 [ 380.055913] [<00000001230a32a4>] worker_thread+0x64/0x498 [ 380.055915] [<00000001230aaf5c>] kthread+0x17c/0x188 [ 380.055919] [<00000001230333c4>] __ret_from_fork+0x3c/0x58 [ 380.055922] [<0000000123bc46ba>] ret_from_fork+0xa/0x40 [ 380.055927] Last Breaking-Event-Address: [ 380.055929] [<000003ff8054e2a8>] 0x3ff8054e2a8 [ 380.055940] Kernel panic - not syncing: Fatal exception: panic_on_oops