ipoib crashes in rdma_put_gid_attr. It happens every time, often during boot process or soon after it, occasionally after few hours since a reboot. After the crash, IPoIB stops working for new connections. Interesting fact is that TCP sessions created before the crash continue to work. The problem occurs on four (4) servers. The servers are running Fedora 29 with kernel 4.19.10-300.fc29.x86_64. Note that 4.19.8-300.fc29.x86_64 has the same problem. The servers use the same Infiniband controller model, OS, kernel, and drivers Device: 02:00.0 InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0) Firmware: 5.3.0 Driver: ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008) More details at https://bugzilla.redhat.com/show_bug.cgi?id=1661864 ------------------------------------------------------------------------- Additional info: reporter: libreport-2.9.7 general protection fault: 0000 [#1] SMP NOPTI CPU: 3 PID: 74 Comm: kworker/u16:1 Not tainted 4.19.10-300.fc29.x86_64 #1 Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99X EVO, BIOS 0402 05/16/2011 Workqueue: ipoib_wq ipoib_cm_tx_reap [ib_ipoib] RIP: 0010:rdma_put_gid_attr+0x9/0x30 [ib_core] Code: 96 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 7b 30 e8 cc 0d c6 f1 48 89 df e8 c4 0d c6 f1 eb c3 c3 90 0f 1f 44 00 00 48 8d 57 d8 <f0> ff 4f d8 0f 88 78 65 01 00 74 01 c3 48 8b 35 2b d0 02 00 48 83 RSP: 0018:ffffb7ad819dbde8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8d1bdf5a2e00 RCX: 0000000000002699 RDX: 206c656e72656af8 RSI: ffff8d1bf7ae6160 RDI: 206c656e72656b20 RBP: 0000000000000000 R08: 0000000000026160 R09: ffffffffc06b45bf R10: ffffe849887da000 R11: 0000000000000002 R12: ffff8d1be30cb400 R13: ffff8d1bdf681800 R14: ffff8d1be2272400 R15: ffff8d1be30ca000 FS: 0000000000000000(0000) GS:ffff8d1bf7ac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4f99d5dc80 CR3: 000000021878e000 CR4: 00000000000006e0 Call Trace: ib_destroy_qp+0xc9/0x240 [ib_core] ipoib_cm_tx_reap+0x1f9/0x4e0 [ib_ipoib] process_one_work+0x1a1/0x3a0 worker_thread+0x30/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x22/0x40 Modules linked in: nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_multiport 8021q garp mrp stp llc ip6t_REJECT nf_reject_ipv6 xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c it87 hwmon_vid ip6table_filter ip6_tables ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm eeepc_wmi amd64_edac_mod asus_wmi edac_mce_amd sparse_keymap rfkill kvm_amd video wmi_bmof mxm_wmi kvm irqbypass k10temp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core ib_mthca sp5100_tco snd_seq snd_hwdep snd_seq_device i2c_piix4 snd_pcm ib_core snd_timer snd soundcore wmi pcc_cpufreq acpi_cpufreq nfsd binfmt_misc nfs_acl lockd grace auth_rpcgss sunrpc dm_crypt raid1 ata_generic i2c_algo_bit uas drm_kms_helper pata_acpi ttm usb_storage pata_marvell drm firewire_ohci firewire_core crc_itu_t r8169 ecryptfs --------------------------------------------------- # lspci | grep Mellanox 02:00.0 InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0) # ibv_devinfo hca_id: mthca0 transport: InfiniBand (0) fw_ver: 5.3.0 node_guid: 0002:c902:0022:1228 sys_image_guid: 0005:ad00:0100:d050 vendor_id: 0x02c9 vendor_part_id: 25218 hw_ver: 0xA0 board_id: MT_0150000001 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 2 port_lid: 5 port_lmc: 0x00 link_layer: InfiniBand port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 2 port_lid: 6 port_lmc: 0x00 link_layer: InfiniBand