ib_ipoib: general protection fault in ib_destroy_qp -> rdma_put_gid_attr+0x9/0x30 [ib_core]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ipoib crashes in rdma_put_gid_attr. It happens every time, often during boot process or soon after it, occasionally after few hours since a reboot.

After the crash, IPoIB stops working for new connections. Interesting fact is that TCP sessions created before the crash continue to work.

The problem occurs on four (4) servers. The servers are running Fedora 29 with kernel 4.19.10-300.fc29.x86_64. Note that 4.19.8-300.fc29.x86_64 has the same problem. 

The servers use the same Infiniband controller model, OS, kernel, and drivers

Device:   02:00.0 InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0)
Firmware: 5.3.0
Driver:   ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)

More details at https://bugzilla.redhat.com/show_bug.cgi?id=1661864

-------------------------------------------------------------------------
Additional info:
reporter:       libreport-2.9.7
general protection fault: 0000 [#1] SMP NOPTI
CPU: 3 PID: 74 Comm: kworker/u16:1 Not tainted 4.19.10-300.fc29.x86_64 #1
Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99X EVO, BIOS 0402 05/16/2011
Workqueue: ipoib_wq ipoib_cm_tx_reap [ib_ipoib]
RIP: 0010:rdma_put_gid_attr+0x9/0x30 [ib_core]
Code: 96 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 7b 30 e8 cc 0d c6 f1 48 89 df e8 c4 0d c6 f1 eb c3 c3 90 0f 1f 44 00 00 48 8d 57 d8 <f0> ff 4f d8 0f 88 78 65 01 00 74 01 c3 48 8b 35 2b d0 02 00 48 83
RSP: 0018:ffffb7ad819dbde8 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8d1bdf5a2e00 RCX: 0000000000002699
RDX: 206c656e72656af8 RSI: ffff8d1bf7ae6160 RDI: 206c656e72656b20
RBP: 0000000000000000 R08: 0000000000026160 R09: ffffffffc06b45bf
R10: ffffe849887da000 R11: 0000000000000002 R12: ffff8d1be30cb400
R13: ffff8d1bdf681800 R14: ffff8d1be2272400 R15: ffff8d1be30ca000
FS:  0000000000000000(0000) GS:ffff8d1bf7ac0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f99d5dc80 CR3: 000000021878e000 CR4: 00000000000006e0
Call Trace:
 ib_destroy_qp+0xc9/0x240 [ib_core]
 ipoib_cm_tx_reap+0x1f9/0x4e0 [ib_ipoib]
 process_one_work+0x1a1/0x3a0
 worker_thread+0x30/0x380
 ? pwq_unbound_release_workfn+0xd0/0xd0
 kthread+0x112/0x130
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x22/0x40
Modules linked in: nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_multiport 8021q garp mrp stp llc ip6t_REJECT nf_reject_ipv6 xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c it87 hwmon_vid ip6table_filter ip6_tables ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm eeepc_wmi amd64_edac_mod asus_wmi edac_mce_amd sparse_keymap rfkill kvm_amd video wmi_bmof mxm_wmi kvm irqbypass k10temp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core ib_mthca sp5100_tco snd_seq snd_hwdep snd_seq_device i2c_piix4 snd_pcm ib_core snd_timer snd soundcore wmi pcc_cpufreq acpi_cpufreq nfsd binfmt_misc nfs_acl
 lockd grace auth_rpcgss sunrpc dm_crypt raid1 ata_generic i2c_algo_bit uas drm_kms_helper pata_acpi ttm usb_storage pata_marvell drm firewire_ohci firewire_core crc_itu_t r8169 ecryptfs

---------------------------------------------------

# lspci | grep Mellanox
02:00.0 InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0)

# ibv_devinfo 
hca_id:	mthca0
	transport:			InfiniBand (0)
	fw_ver:				5.3.0
	node_guid:			0002:c902:0022:1228
	sys_image_guid:			0005:ad00:0100:d050
	vendor_id:			0x02c9
	vendor_part_id:			25218
	hw_ver:				0xA0
	board_id:			MT_0150000001
	phys_port_cnt:			2
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		2048 (4)
			active_mtu:		2048 (4)
			sm_lid:			2
			port_lid:		5
			port_lmc:		0x00
			link_layer:		InfiniBand

		port:	2
			state:			PORT_ACTIVE (4)
			max_mtu:		2048 (4)
			active_mtu:		2048 (4)
			sm_lid:			2
			port_lid:		6
			port_lmc:		0x00
			link_layer:		InfiniBand






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux