Re: null pointer in rxe_mr_copy()

Yanjun Zhu <yanjun.zhu@xxxxxxxxx> · Tue, 12 Apr 2022 18:04:59 +0800

在 2022/4/12 11:13, Bob Pearson 写道:
On 4/11/22 11:25, Pearson, Robert B wrote:
Zhu,

Would you be willing to try the v13 pool patch series. It also fixes the blktests bug.
(You have to apply Bart's scsi_debug revert patch to fix that issue.)
I think it may also fix this issue because it is way more careful about deferring qp cleanup
code until after all the packets have completed.

The bug you are seeing feels like a race with qp destroy.

Bob

-----Original Message-----
From: Zhu Yanjun <zyjzyj2000@xxxxxxxxx>
Sent: Monday, April 11, 2022 12:34 AM
To: Bob Pearson <rpearsonhpe@xxxxxxxxx>
Cc: linux-rdma@xxxxxxxxxxxxxxx
Subject: Re: null pointer in rxe_mr_copy()

On Mon, Apr 11, 2022 at 1:14 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:

On Mon, Apr 11, 2022 at 11:34 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:

Zhu,

Since checking for mr == NULL in rxe_mr_copy fixes the problem you were seeing in rping.
Perhaps it would be a good idea to apply the following patch which
would tell us which of the three calls to rxe_mr_copy is failing. My
suspicion is the one in read_reply()
Hi, Bob

Yes. It is the function read_reply.

  720 static enum resp_states read_reply(struct rxe_qp *qp,
  721                                    struct rxe_pkt_info *req_pkt)
  722 {
  723         struct rxe_pkt_info ack_pkt;
  724         struct sk_buff *skb;
  725         int mtu = qp->mtu;
  726         enum resp_states state;
  727         int payload;
  728         int opcode;
  729         int err;
  730         struct resp_res *res = qp->resp.res;
  731         struct rxe_mr *mr;
  732
  733         if (!res) {
  734                 res = rxe_prepare_read_res(qp, req_pkt);
  735                 qp->resp.res = res;
  736         }
  737
  738         if (res->state == rdatm_res_state_new) {
  739                 mr = qp->resp.mr;
<----It seems that mr is from here.
  740                 qp->resp.mr = NULL;
  741



  kernel: ------------[ cut here ]------------
  kernel: WARNING: CPU: 74 PID: 38510 at
drivers/infiniband/sw/rxe/rxe_resp.c:768 rxe_responder+0x1d67/0x1dd0
[rdma_rxe]
  kernel: Modules linked in: rdma_rxe(OE) ip6_udp_tunnel udp_tunnel
rds_rdma rds xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc
vfat fat rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod
target_core_mod intel_rapl_msr intel_rapl_common ib_iser libiscsi
scsi_transport_iscsi rdma_cm ib_cm i10nm_edac iw_cm nfit libnvdimm
x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel kvm
irdma iTCO_wdt iTCO_vendor_support i40e irqbypass crct10dif_pclmul
crc32_pclmul ib_uverbs ghash_clmulni_intel rapl intel_cstate ib_core
intel_uncore wmi_bmof pcspkr mei_me isst_if_mbox_pci isst_if_mmio
acpi_ipmi isst_if_common ipmi_si i2c_i801 mei intel_pch_thermal
i2c_smbus ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs
libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg mgag200 i2c_algo_bit
drm_shmem_helper drm_kms_helper syscopyarea sysfillrect ice
  kernel: sysimgblt fb_sys_fops ahci drm libahci crc32c_intel libata
megaraid_sas tg3 wmi dm_mirror dm_region_hash dm_log dm_mod fuse [last
unloaded: ip6_udp_tunnel]
  kernel: CPU: 74 PID: 38510 Comm: rping Kdump: loaded Tainted: G S
  W  OE     5.18.0.RXE #14
  kernel: Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.2.4
05/28/2021
  kernel: RIP: 0010:rxe_responder+0x1d67/0x1dd0 [rdma_rxe]
  kernel: Code: 24 30 48 89 44 24 30 49 8b 86 88 00 00 00 48 89 44 24
38 48 8b 73 20 48 8b 43 18 ff d0 0f 1f 00 e9 10 e3 ff ff e8 e9 52 98
ee <0f> 0b 45 8b 86 f0 00 00 00 48 8b 8c 24 e0 00 00 00 ba 01 03 00 00
  kernel: RSP: 0018:ff5f5b78c7624e70 EFLAGS: 00010246
  kernel: RAX: ff20346c70a1d700 RBX: ff20346c7127c040 RCX:
ff20346c70a1d700
  kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ff20346c53194000
  kernel: RBP: 0000000000000040 R08: 2ebbb556a556fe7f R09:
69de575d0320dc48
  kernel: R10: ff5f5b78c7624de0 R11: 00000000ee4984a4 R12:
ff20346c70a1d700
  kernel: R13: 0000000000000000 R14: ff20346ef0539000 R15:
ff20346c70a1c528
  kernel: FS:  00007ff34d49b740(0000) GS:ff20347b3fa80000(0000)
knlGS:0000000000000000
  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: 00007ff40be030c0 CR3: 00000003d0634005 CR4:
0000000000771ee0
  kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
  kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
  kernel: PKRU: 55555554
  kernel: Call Trace:
  kernel: <IRQ>
  kernel: ? __local_bh_enable_ip+0x9f/0xe0
  kernel: ? rxe_do_task+0x67/0xe0 [rdma_rxe]
  kernel: ? __local_bh_enable_ip+0x77/0xe0
  kernel: rxe_do_task+0x71/0xe0 [rdma_rxe]
  kernel: tasklet_action_common.isra.15+0xb8/0xf0
  kernel: __do_softirq+0xe4/0x48c
  kernel: ? rxe_do_task+0x67/0xe0 [rdma_rxe]
  kernel: do_softirq+0xb5/0x100
  kernel: </IRQ>
  kernel: <TASK>
  kernel: __local_bh_enable_ip+0xd0/0xe0
  kernel: rxe_do_task+0x67/0xe0 [rdma_rxe]
  kernel: rxe_post_send+0x2ff/0x4c0 [rdma_rxe]
  kernel: ? rdma_lookup_get_uobject+0x131/0x1e0 [ib_uverbs]
  kernel: ib_uverbs_post_send+0x4d5/0x700 [ib_uverbs]
  kernel: ib_uverbs_write+0x38f/0x5e0 [ib_uverbs]
  kernel: ? find_held_lock+0x2d/0x90
  kernel: vfs_write+0xb8/0x370
  kernel: ksys_write+0xbb/0xd0
  kernel: ? syscall_trace_enter.isra.15+0x169/0x220
  kernel: do_syscall_64+0x37/0x80

Zhu Yanjun

  in rxe_resp.c
This could be caused by a race between shutting down the qp and finishing up an RDMA read.
The responder resources state machine is completely unprotected from
simultaneous access by verbs code and bh code in rxe_resp.c.
rxe_resp is a tasklet so all the accesses from there are serialized
but if anyone makes a verbs call that touches the responder resources it could cause problems. The most likely (only?) place this could happen is qp shutdown.

Bob

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
b/drivers/infiniband/sw/rxe/rxe_mr.c

index 60a31b718774..66184f5a4ddf 100644

--- a/drivers/infiniband/sw/rxe/rxe_mr.c

+++ b/drivers/infiniband/sw/rxe/rxe_mr.c

@@ -489,6 +489,7 @@ int copy_data(

                 if (bytes > 0) {

                         iova = sge->addr + offset;



+                       WARN_ON(!mr);

                         err = rxe_mr_copy(mr, iova, addr, bytes,
dir);

                         if (err)

                                 goto err2;

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c
b/drivers/infiniband/sw/rxe/rxe_resp.c

index 1d95fab606da..6e3e86bdccd7 100644

--- a/drivers/infiniband/sw/rxe/rxe_resp.c

+++ b/drivers/infiniband/sw/rxe/rxe_resp.c

@@ -536,6 +536,7 @@ static enum resp_states write_data_in(struct
rxe_qp *qp,

         int     err;

         int data_len = payload_size(pkt);



+       WARN_ON(!qp->resp.mr);

         err = rxe_mr_copy(qp->resp.mr, qp->resp.va +
qp->resp.offset,

                           payload_addr(pkt), data_len,
RXE_TO_MR_OBJ);

         if (err) {

@@ -772,6 +773,7 @@ static enum resp_states read_reply(struct rxe_qp
*qp,

         if (!skb)

                 return RESPST_ERR_RNR;



+       WARN_ON(!mr);

         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),

                           payload, RXE_FROM_MR_OBJ);

         if (err)


When you run rping are you going between two machines? It doesn't work in loopback as far as I can tell.

Sorry. It is late to reply.

With 2 machines or loopback, long time rping (about 10 minutes or so), 
the crash will occur.

Zhu Yanjun


Bob