crash in 4.14-rc1 with IPoIB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

I wanted to try out Christoph's NVMe multipathing patchset on my NVMe OmniPath
setup and merged it into 4.14-rc1. On bootup I stumbled upon that splat and no
RDMA operation was possible:


hfi1 0000:ff:00.0: hfi1_1: send_idle_message: sending idle message 0x203
hfi1 0000:ff:00.0: hfi1_1: Switching to NO_DMA_RTAIL
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP:           (null)
PGD 0 P4D 0
Oops: 0010 [#1] SMP
Modules linked in: iptable_filter(E) af_packet(E) xt_nat(E) xt_tcpudp(E) iscsi_ibft(E) iscsi_boot_sysfs(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) libcrc32c(E) ip_tables(E) x_tables(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) ib_srpt(E) target_core_mod(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ib_srp(E) scsi_transport_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) configfs(E) ib_cm(E) iw_cm(E) mlx5_ib(E) intel_rapl(E) sha512_ssse3(E) skx_edac(E) sha512_generic(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) ipmi_ssif(E) pcbc(E) aesni_intel(E) mlx5_core(E)
 qat_c62x(E) aes_x86_64(E) intel_qat(E) mlxfw(E) joydev(E) hfi1(E) i40e(E) crypto_simd(E) devlink(E) rdmavt(E) ipmi_si(E) ptp(E) iTCO_wdt(E) dh_generic(E) glue_helper(E) iTCO_vendor_support(E) authenc(E) ib_core(E) pps_core(E) ipmi_devintf(E) mei_me(E) ioatdma(E) cryptd(E) lpc_ich(E) pcspkr(E) mfd_core(E) i2c_i801(E) shpchp(E) mei(E) dca(E) ipmi_msghandler(E) tpm_crb(E) nfit(E) libnvdimm(E) acpi_pad(E) sunrpc(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) hid_generic(E) usbhid(E) raid6_pq(E) sd_mod(E) sr_mod(E) cdrom(E) crc32c_intel(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) xhci_pci(E) ahci(E) xhci_hcd(E) libahci(E) drm(E) usbcore(E) libata(E) wmi(E) button(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E)
 scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E)
CPU: 20 PID: 950 Comm: kworker/20:1H Tainted: G            E   4.14.0-rc1-6.3-default-nvme-mpath #773
 Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0412.020920172159 02/09/2017
 Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
 task: ffff882fce3f4b00 task.stack: ffffc9002422c000
 RIP: 0010:          (null)
 RSP: 0018:ffffc9002422f990 EFLAGS: 00010206
 RAX: ffff882fd0078000 RBX: ffff882fa0263000 RCX: ffffc9002422f998
 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff882fd0078000
 RBP: ffffc9002422fad0 R08: 0000000000000000 R09: ffff882fa0263080
 R10: ffffffffa0964ca0 R11: 0000000000000000 R12: ffff8817dcea3700
 R13: ffff882fa0263000 R14: 000000000000c000 R15: 000000000000c000
 FS:  0000000000000000(0000) GS:ffff882fdd000000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000017db346004 CR4: 00000000007606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  ? is_valid_mcast_lid.isra.23+0xfb/0x110 [ib_core]
  ib_attach_mcast+0x6f/0xa0 [ib_core]
  ipoib_mcast_attach+0x72/0x160 [ib_ipoib]
  ipoib_mcast_join_complete+0x354/0xb40 [ib_ipoib]
  mcast_work_handler+0x2ff/0x630 [ib_core]
  join_handler+0xf0/0x1e0 [ib_core]
  ib_sa_mcmember_rec_callback+0x54/0x80 [ib_core]
  recv_handler+0x3a/0x60 [ib_core]
  ib_mad_recv_done+0x43d/0xa20 [ib_core]
  __ib_process_cq+0x5d/0xb0 [ib_core]
  ib_cq_poll_work+0x20/0x60 [ib_core]
  process_one_work+0x138/0x370
  worker_thread+0x4d/0x3b0
  kthread+0x109/0x140
  ? rescuer_thread+0x320/0x320
  ? kthread_park+0x60/0x60
  ret_from_fork+0x25/0x30
 Code:  Bad RIP value.
 RIP:           (null) RSP: ffffc9002422f990
 CR2: 0000000000000000
 ---[ end trace f3c2d0cdf0ebfb9c ]---

is_valid_mcast_lid.isra.23+0xfb/0x110

(gdb) l *(is_valid_mcast_lid+0xfb)
0x229b is in is_valid_mcast_lid (drivers/infiniband/core/verbs.c:1649).
1644		/* If QP state >= init, it is assigned to a port and we can check this
1645		 * port only.
1646		 */
1647		if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) {
1648			if (attr.qp_state >= IB_QPS_INIT) {
1649				if (qp->device->get_link_layer(qp->device, attr.port_num) !=
1650				    IB_LINK_LAYER_INFINIBAND)
1651					return true;
1652				goto lid_check;
1653			}
(gdb) 

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@xxxxxxx                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux