Re: RDMA/mlx5: Regression since v5.15-rc5: Kernel panic when called ib_dereg_mr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/12/2021 09:04, Tony Lu wrote:
> Hello,
> 
> During developing and testing of SMC (net/smc), We found a problem,
> when SMC released linkgroup or link, it called ib_dereg_mr to release
> resources, then it panicked in mlx5_ib_dereg_mr. After investigation,
> we found this panic was introduce by this commit:
> 
>     f0ae4afe3d35 ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow")

+1, this panic in our environment:

[  380.055202] smc: SMC-R lg 00000200 link removed: id 00000201, peerid 00000101, ibdev mlx5_0, ibport 1
[  380.055230] smc: SMC-R lg 00000100 state changed: SINGLE, pnetid NET10           
[  380.055605] Unable to handle kernel pointer dereference in virtual kernel address space
[  380.055607] Failing address: 7563745f64657000 TEID: 7563745f64657803
[  380.055609] Fault in home space mode while using kernel ASCE.
[  380.055613] AS:0000000124abc007 R3:0000000000000024 
[  380.055650] Oops: 0038 ilc:3 [#1] SMP 
[  380.055655] Modules linked in: dummy smc_diag smc tcp_diag ...
[  380.055698] CPU: 2 PID: 21939 Comm: kworker/2:22 Not tainted 5.16.0-20211220.rc5.git0.c4a510cd6ab8.300.fc35.s390x #1
[  380.055700] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
[  380.055702] Workqueue: events smc_link_down_work [smc]
[  380.055717] Krnl PSW : 0704e00180000000 000000012311abbc (dma_unmap_sg_attrs+0x1c/0x68)
[  380.055729]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[  380.055732] Krnl GPRS: 0000000000000018 000000012311aba0 7563745f64657461 000000010232f003
[  380.055735]            0000000002330003 0000000000000000 0000000000000000 0000000000000000
[  380.055738]            0000000000000000 000000008fe64000 0000000084cd6000 000000008fe64000
[  380.055740]            0000000035244200 00000000b669c248 000003800a077a68 000003800a077a10
[  380.055748] Krnl Code: 000000012311abac: b90400ef		lgr	%r14,%r15
                          000000012311abb0: e3f0ffa8ff71	lay	%r15,-88(%r15)
                         #000000012311abb6: e3e0f0980024	stg	%r14,152(%r15)
                         >000000012311abbc: e3b021300002	ltg	%r11,304(%r2)
                          000000012311abc2: a7840013		brc	8,000000012311abe8
                          000000012311abc6: ec52001d027f	clij	%r5,2,2,000000012311ac00
                          000000012311abcc: e310b0580002	ltg	%r1,88(%r11)
                          000000012311abd2: a7840005		brc	8,000000012311abdc
[  380.055775] Call Trace:
[  380.055777]  [<000000012311abbc>] dma_unmap_sg_attrs+0x1c/0x68 
[  380.055780]  [<000003ff80560bd2>] __ib_umem_release+0xc2/0xd8 [ib_uverbs] 
[  380.055797]  [<000003ff805610a6>] ib_umem_release+0x4e/0xe0 [ib_uverbs] 
[  380.055806]  [<000003ff804fe7ca>] mlx5_ib_dereg_mr.localalias+0x212/0x480 [mlx5_ib] 
[  380.055830]  [<000003ff803a0ddc>] ib_dereg_mr_user+0x5c/0xe0 [ib_core] 
[  380.055878]  [<000003ff806c249c>] smcr_buf_unmap_link+0x64/0xe0 [smc] 
[  380.055887]  [<000003ff806c2cb2>] smcr_link_clear.part.0+0x72/0x230 [smc] 
[  380.055896]  [<000003ff806c6364>] smcr_link_down+0xc4/0x1b8 [smc] 
[  380.055902]  [<000003ff806c64be>] smc_link_down_work+0x66/0x88 [smc] 
[  380.055909]  [<00000001230a2b02>] process_one_work+0x1fa/0x470 
[  380.055913]  [<00000001230a32a4>] worker_thread+0x64/0x498 
[  380.055915]  [<00000001230aaf5c>] kthread+0x17c/0x188 
[  380.055919]  [<00000001230333c4>] __ret_from_fork+0x3c/0x58 
[  380.055922]  [<0000000123bc46ba>] ret_from_fork+0xa/0x40 
[  380.055927] Last Breaking-Event-Address:
[  380.055929]  [<000003ff8054e2a8>] 0x3ff8054e2a8
[  380.055940] Kernel panic - not syncing: Fatal exception: panic_on_oops



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux