On 03/06/2017 01:56 PM, Eli Cohen wrote: > Please send information on: > > - The size of the required inline data in the offending work request 44 bytes (ib_createqp with ib_qp_init_attr.cap.max_inline_data=44) > - The transport service used IB_QPT_RC > - How many outstanding work requests the send queue is configured to ib_create_cq with ib_cq_init_attr.cqe=32768 ib_create_qp with ib_qp_init_attr.cap.max_send_wr=16 > - What was the serial number of the work request that triggered this oops (first, second, 65th etc). serial number wr_id=1 > > -----Original Message----- > From: Ursula Braun [mailto:ubraun@xxxxxxxxxxxxxxxxxx] > Sent: Monday, March 6, 2017 5:17 AM > To: Eli Cohen <eli@xxxxxxxxxxxx>; Matan Barak <matanb@xxxxxxxxxxxx>; Leon Romanovsky <leonro@xxxxxxxxxxxx> > Cc: linux-rdma@xxxxxxxxxxxxxxx > Subject: Re: mlx5_ib_post_send panic on s390x > > > On 02/24/2017 06:28 PM, Eli Cohen wrote: >> Hi, >> >> Can you please send details of the work request you are posting? I assume you are using inline, right? > yes, inline is used: > > lnk->wr_tx_sges[i].addr = > lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; > lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE; > lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey; > lnk->wr_tx_ibs[i].next = NULL; > lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i]; > lnk->wr_tx_ibs[i].num_sge = 1; > lnk->wr_tx_ibs[i].opcode = IB_WR_SEND; > lnk->wr_tx_ibs[i].send_flags = > IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE; > >> >> -----Original Message----- >> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-owner@xxxxxxxxxxxxxxx] On Behalf Of Ursula Braun >> Sent: Friday, February 24, 2017 3:52 AM >> To: matamb@xxxxxxxxxxxx; Leon Romanovsky <leonro@xxxxxxxxxxxx> >> Cc: linux-rdma@xxxxxxxxxxxxxxx >> Subject: mlx5_ib_post_send panic on s390x >> >> Hi Saeed and Matan, >> >> up to now I run SMC-R traffic on Connect X3, which works. >> But when switching to Connect X4, the first mlx5_ib_post_send() fails: >> >> [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE. >> [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4 >> [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4 >> [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR) >> [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48) >> [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >> [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8 >> [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440 >> [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001 >> [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38 >> [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2 >> 0000000000762408: a7740008 brc 7,762418 >> #000000000076240c: c05000000011 larl %r5,76242e >> >0000000000762412: 44405000 ex %r4,0(%r5) >> 0000000000762416: 07fe bcr 15,%r14 >> 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3) >> 000000000076241e: 41101100 la %r1,256(%r1) >> 0000000000762422: 41303100 la %r3,256(%r3) >> [ 247.787780] Call Trace: >> [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off. >> [ 247.787807] Last Breaking-Event-Address: >> [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops >> >> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). >> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). >> >> Kind regards, Ursula Braun (IBM Germany) >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html