On Tue, Aug 16, 2022 at 11:35:15AM +0200, Jan Karcher wrote: > > > On 10.08.2022 19:47, D. Wythe wrote: > > From: "D. Wythe" <alibuda@xxxxxxxxxxxxxxxxx> > > > > This patch set attempts to optimize the parallelism of SMC-R connections, > > mainly to reduce unnecessary blocking on locks, and to fix exceptions that > > occur after thoses optimization. > > > > Thank you again for your submission! > Let me give you a quick update from our side: > We tested your patches on top of the net-next kernel on our s390 systems. > They did crash our systems. After verifying our environment we pulled > console logs and now we can tell that there is indeed a problem with your > patches regarding SMC-D. So please do not integrate this change as of right > now. I'm going to do more in depth reviews of your patches but i need some > time for them so here is a quick a description of the problem: > > It is a SMC-D problem, that occurs while building up the connection. In > smc_conn_create you set struct smc_lnk_cluster *lnkc = NULL. For the SMC-R > path you do grab the pointer, for SMC-D that never happens. Still you are > using this refernce for SMC-D => Crash. This problem can be reproduced using > the SMC-D path. Here is an example console output: Got it. > > [ 779.516382] Unable to handle kernel pointer dereference in virtual kernel > address space > [ 779.516389] Failing address: 0000000000000000 TEID: 0000000000000483 > [ 779.516391] Fault in home space mode while using kernel ASCE. > [ 779.516395] AS:0000000069628007 R3:00000000ffbf0007 S:00000000ffbef800 > P:000000000000003d > [ 779.516431] Oops: 0004 ilc:2 [#1] SMP > [ 779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib ib_uverbs > mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4 > nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 > nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv > 6 nf_defrag_ipv4 ip_set nf_tables n > [ 779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted > 5.19.0-13940-g22a46254655a #3 > [ 779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) > > [ 779.522738] Workqueue: smc_hs_wq smc_listen_work [smc] > [ 779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c > (smc_conn_create+0x174/0x968 [smc]) > [ 779.522766] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 > RI:0 EA:3 > [ 779.522770] Krnl GPRS: 0000000000000002 0000000000000000 0000000000000001 > 0000000000000000 > [ 779.522773] 000000008a4128a0 000003ff803f21aa 000000008e30d640 > 0000000086d72000 > [ 779.522776] 0000000086d72000 000000008a412803 000000008a412800 > 000000008e30d650 > [ 779.522779] 0000000080934200 0000000000000000 000003ff803cb954 > 00000380002dfa88 > [ 779.522789] Krnl Code: 000003ff803da88e: e310f0e80024 stg > %r1,232(%r15) > [ 779.522789] 000003ff803da894: a7180000 lhi %r1,0 > [ 779.522789] #000003ff803da898: 582003ac l %r2,940 > [ 779.522789] >000003ff803da89c: ba123020 cs > %r1,%r2,32(%r3) > [ 779.522789] 000003ff803da8a0: ec1603be007e cij > %r1,0,6,000003ff803db01c > > [ 779.522789] 000003ff803da8a6: 4110b002 la > %r1,2(%r11) > [ 779.522789] 000003ff803da8aa: e310f0f00024 stg > %r1,240(%r15) > [ 779.522789] 000003ff803da8b0: e310f0c00004 lg > %r1,192(%r15) > [ 779.522870] Call Trace: > [ 779.522873] [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc] > [ 779.522884] [<000003ff803cb954>] smc_find_ism_v2_device_serv+0x1b4/0x300 > [smc] > 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop > from CPU 01. > 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop > from CPU 00. > [ 779.522894] [<000003ff803cbace>] smc_listen_find_device+0x2e/0x370 [smc] > > > I'm going to send the review for the first patch right away (which is the > one causing the crash), so far I'm done with it. The others are going to > follow. Maybe you can look over the problem and come up with a solution, > otherwise we are going to decide if we want to look into it as soon as I'm > done with the reviews. Thank you for your patience. Thanks for pointing this issue. We will fix this soon in v2. Tony Lu