Hello https://www.spinics.net/lists/linux-rdma/msg51334.html A rhel 7.5 with backports from upstream is hitting this. Chuck Reported it and Sagi and Max responded but its not clear if we ever fixed this. In this case we land up in a panic, noty just messaging, although the messages logged for a long time over and over until we finally panicked. crash> log | grep "memreg failure: memor" | wc -l 2414 crash> log [1635578.012721] connection16:0: detected conn error (1011) [1635587.050688] mlx5_0:dump_cqe:262:(pid 93128): dump error cqe [1635587.089686] 00000000 00000000 00000000 00000000 [1635587.123989] 00000000 00000000 00000000 00000000 [1635587.157494] 00000000 00000000 00000000 00000000 [1635587.190968] 00000000 08007806 250002ad ba6115d3 [1635587.224331] iser: iser_err_comp: memreg failure: memory management operation error (6) vend_err 78 [1635587.278876] connection15:0: detected conn error (1011) [1635590.986286] mlx5_1:dump_cqe:262:(pid 0): dump error cqe [1635591.021891] 00000000 00000000 00000000 00000000 [1635591.053944] 00000000 00000000 00000000 00000000 [1657077.997960] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [1657077.997967] IP: [<ffffffffc08a541e>] iscsi_verify_itt+0x1e/0x110 [libiscsi] [1657077.997970] PGD 80000098de387067 PUD b8d9ffa067 PMD 0 [1657077.997971] Oops: 0000 [#1] SMP [1657077.998009] Modules linked in: oracleasm(O) nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_round_robin bonding rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_core vfat fat xfs sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass iTCO_wdt crc32_pclmul ipmi_ssif iTCO_vendor_support ghash_clmulni_intel aesni_intel lrw gf128mul ipmi_si glue_helper ablk_helper cryptd sg hpwdt hpilo pcspkr ipmi_devintf ioatdma dm_multipath i2c_i801 lpc_ich shpchp dca wmi ipmi_msghandler pcc_cpufreq acpi_power_meter nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic [1657077.998020] i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm bnx2x mlx5_core crct10dif_pclmul mdio tg3(OE) devlink libcrc32c crct10dif_common drm hpsa(OE) ptp i2c_core crc32c_intel scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod [1657077.998023] CPU: 20 PID: 41538 Comm: sh Tainted: G OE - ----------- 3.10.0-693.34.1.el7_bz1582551.x86_64 #1 [1657077.998024] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/21/2018 [1657077.998025] task: ffff88587ce38fd0 ti: ffff884dd0af0000 task.ti: ffff884dd0af0000 [1657077.998029] RIP: 0010:[<ffffffffc08a541e>] [<ffffffffc08a541e>] iscsi_verify_itt+0x1e/0x110 [libiscsi] [1657077.998030] RSP: 0000:ffff88beff403d78 EFLAGS: 00010286 [1657077.998031] RAX: 000000000000004c RBX: 00000000b0000036 RCX: 0000000000000002 [1657077.998032] RDX: 00000000000000cc RSI: 00000000b0000036 RDI: 0000000000000000 [1657077.998033] RBP: ffff88beff403da0 R08: 0000000040032a20 R09: ffff8896e4eaf91c [1657077.998034] R10: 0000000000000000 R11: 00007ffff7763ca0 R12: 0000000000000000 [1657077.998035] R13: ffff8896e4eaf9e4 R14: ffff8896e4eaf900 R15: 0000000000000000 [1657077.998036] FS: 00007ffff7fe6740(0000) GS:ffff88beff400000(0000) knlGS:0000000000000000 [1657077.998038] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1657077.998039] CR2: 0000000000000010 CR3: 000000ad92eba000 CR4: 00000000003607e0 [1657077.998040] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1657077.998041] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1657077.998042] Call Trace: [1657077.998044] <IRQ> [1657077.998046] [<ffffffffc08a5527>] iscsi_itt_to_ctask+0x17/0x80 [libiscsi] [1657077.998050] [<ffffffffc05eefea>] iser_task_rsp+0xca/0x360 [ib_iser] [1657077.998061] [<ffffffffc0587fbb>] __ib_process_cq+0x6b/0xe0 [ib_core] [1657077.998066] [<ffffffffc0588122>] ib_poll_handler+0x22/0x80 [ib_core] [1657077.998070] [<ffffffff81358507>] irq_poll_softirq+0xc7/0x100 [1657077.998076] [<ffffffff81095195>] __do_softirq+0xf5/0x280 [1657077.998081] [<ffffffff816c4e8c>] call_softirq+0x1c/0x30 [1657077.998086] [<ffffffff8102d435>] do_softirq+0x65/0xa0 [1657077.998088] [<ffffffff81095515>] irq_exit+0x105/0x110 [1657077.998091] [<ffffffff816c61d6>] do_IRQ+0x56/0xf0 [1657077.998098] [<ffffffff816b837c>] common_interrupt+0x17c/0x17c [1657077.998099] <EOI> [1657077.998113] Code: ff ff ff eb a9 41 be 95 ff ff ff eb a1 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 89 f3 48 83 ec 10 c7 45 d8 00 00 00 00 <4c> 8b 6f 10 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 83 fe [1657077.998116] RIP [<ffffffffc08a541e>] iscsi_verify_itt+0x1e/0x110 [libiscsi] [1657077.998116] RSP <ffff88beff403d78> [1657077.998117] CR2: 0000000000000010 crash> crash> bt PID: 41538 TASK: ffff88587ce38fd0 CPU: 20 COMMAND: "sh" #0 [ffff88beff403a18] machine_kexec at ffffffff8105ddeb #1 [ffff88beff403a78] __crash_kexec at ffffffff81109902 #2 [ffff88beff403b48] crash_kexec at ffffffff811099f0 #3 [ffff88beff403b60] oops_end at ffffffff816b97a8 #4 [ffff88beff403b88] no_context at ffffffff816a8c96 #5 [ffff88beff403bd8] __bad_area_nosemaphore at ffffffff816a8d2c #6 [ffff88beff403c20] bad_area_nosemaphore at ffffffff816a8e96 #7 [ffff88beff403c30] __do_page_fault at ffffffff816bc6be #8 [ffff88beff403c90] do_page_fault at ffffffff816bc865 #9 [ffff88beff403cc0] page_fault at ffffffff816b8788 [exception RIP: iscsi_verify_itt+30] RIP: ffffffffc08a541e RSP: ffff88beff403d78 RFLAGS: 00010286 RAX: 000000000000004c RBX: 00000000b0000036 RCX: 0000000000000002 RDX: 00000000000000cc RSI: 00000000b0000036 RDI: 0000000000000000 RBP: ffff88beff403da0 R8: 0000000040032a20 R9: ffff8896e4eaf91c R10: 0000000000000000 R11: 00007ffff7763ca0 R12: 0000000000000000 R13: ffff8896e4eaf9e4 R14: ffff8896e4eaf900 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #10 [ffff88beff403da8] iscsi_itt_to_ctask at ffffffffc08a5527 [libiscsi] #11 [ffff88beff403dc8] iser_task_rsp at ffffffffc05eefea [ib_iser] #12 [ffff88beff403e10] __ib_process_cq at ffffffffc0587fbb [ib_core] #13 [ffff88beff403e50] ib_poll_handler at ffffffffc0588122 [ib_core] #14 [ffff88beff403e80] irq_poll_softirq at ffffffff81358507 #15 [ffff88beff403eb8] __do_softirq at ffffffff81095195 #16 [ffff88beff403f28] call_softirq at ffffffff816c4e8c #17 [ffff88beff403f40] do_softirq at ffffffff8102d435 #18 [ffff88beff403f60] irq_exit at ffffffff81095515 #19 [ffff88beff403f78] do_IRQ at ffffffff816c61d6 --- <IRQ stack> --- #20 [ffff884dd0af3f58] ret_from_intr at ffffffff816b837c RIP: 000000000041b866 RSP: 00007fffffffea28 RFLAGS: 00000206 RAX: 0000000000000000 RBX: 00007fffffffef53 RCX: 00000000006f1a70 RDX: 00000000006f1a70 RSI: 00000000006f1a90 RDI: 0000000000000000 RBP: 0000000000000002 R8: 0000000000000001 R9: 0000000000000020 R10: 0000000000000003 R11: 00007ffff7763ca0 R12: ffff88beff4061e8 R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000063 ORIG_RAX: ffffffffffffffbb CS: 0033 SS: 002b crash> ps -p 41538 PID: 0 TASK: ffffffff81a0e480 CPU: 0 COMMAND: "swapper/0" PID: 1 TASK: ffff88012e4c8000 CPU: 7 COMMAND: "systemd" PID: 2345 TASK: ffff885ef5eb8fd0 CPU: 14 COMMAND: "zabbix_agentd" PID: 2349 TASK: ffff885efcbcaf70 CPU: 1 COMMAND: "zabbix_agentd" PID: 41538 TASK: ffff88587ce38fd0 CPU: 20 COMMAND: "sh" -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html