Re: [PATCH] Fix bnxt_re crash in bnxt_qplib_process_qp_event

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I encountered a similar issue with the bnxt_re driver from Linux 6.12 to 6.14
where a KVM host kernel crash occurs in bnxt_qplib_process_qp_event due to a 
write access to an invalid memory address (ffff9f058cedbb10) after performing
few SRIOV operations on the guest. It doesn’t happen on Linux 6.11. It can’t be 
reproduced consistently, happens 2 out of 5 times.

System details:
- NIC: Broadcom BCM57417 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller

The crash trace is as follows:
[ 6882.739369] BUG: unable to handle page fault for address: ffff9f058cedbb10
[ 6882.739771] #PF: supervisor write access in kernel mode
[ 6882.740127] #PF: error_code(0x0002) - not-present page
[ 6882.740417] PGD 100000067 P4D 100000067 PUD 1002e3067 PMD 107b10067 PTE 0
[ 6882.740696] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
[ 6882.740971] CPU: 23 UID: 0 PID: 0 Comm: swapper/23 Kdump: loaded Not tainted 6.12.0-0.16.14.el9uek.x86_64 #1
[ 6882.741528] RIP: 0010:bnxt_qplib_process_qp_event.isra.0+0xa5/0x323 [bnxt_re]
[ 6882.741827] Code: 74 0d 80 7d 01 00 75 07 f0 ff 8b d0 02 00 00 41 80 7f 11 00 0f 84 87 00 00 00 49 8b 17 48 85 d2 0f 84 0e 02 00 00 48 8b 4d 00 <48> 89 0a 48 8b 4d 08 48 89 4a 08 44 0f bf e0 41 8b 47 08 41 c7 47
[ 6882.742434] RSP: 0018:ffff9f058cf1ce88 EFLAGS: 00010282
[ 6882.742754] RAX: 0000000000000000 RBX: ffff904ceb600c80 RCX: 0000000000000338
[ 6882.743078] RDX: ffff9f058cedbb10 RSI: 0000000000000000 RDI: 0000000000000000
[ 6882.743395] RBP: ffff9044dc5bd660 R08: 0000000000000000 R09: 0000000000000000
[ 6882.743705] R10: 0000000000000000 R11: 0000000000000000 R12: ffff90434b3f8000
[ 6882.743987] R13: ffff9f058cf1cf14 R14: ffff904ceb600c98 R15: ffff90444df40000
[ 6882.744272] FS:  0000000000000000(0000) GS:ffff908180e80000(0000) knlGS:0000000000000000
[ 6882.744556] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6882.744839] CR2: ffff9f058cedbb10 CR3: 0000001773e38001 CR4: 00000000007726f0
[ 6882.745125] PKRU: 55555554
[ 6882.745406] Call Trace:
[ 6882.745686]  <IRQ>
[ 6882.745964]  ? show_trace_log_lvl+0x1b0/0x300
[ 6882.746247]  ? show_trace_log_lvl+0x1b0/0x300
[ 6882.746529]  ? bnxt_qplib_service_creq+0x16a/0x236 [bnxt_re]
[ 6882.746821]  ? __die_body.cold+0x8/0x17
[ 6882.747099]  ? page_fault_oops+0x162/0x16d
[ 6882.747397]  ? exc_page_fault+0x16d/0x180
[ 6882.747700]  ? asm_exc_page_fault+0x26/0x30
[ 6882.747975]  ? bnxt_qplib_process_qp_event.isra.0+0xa5/0x323 [bnxt_re]
[ 6882.748250]  ? bnxt_qplib_process_qp_event.isra.0+0x43/0x323 [bnxt_re]
[ 6882.748518]  bnxt_qplib_service_creq+0x16a/0x236 [bnxt_re]
[ 6882.748785]  tasklet_action_common+0xca/0x240
[ 6882.749042]  handle_softirqs+0xe1/0x2ac
[ 6882.749295]  __irq_exit_rcu+0xab/0xd0
[ 6882.749571]  common_interrupt+0x85/0xa0
[ 6882.749835]  </IRQ>
[ 6882.750094]  <TASK>
[ 6882.750350]  asm_common_interrupt+0x26/0x40
[ 6882.750622] RIP: 0010:cpuidle_enter_state+0xc6/0x430
[ 6882.750870] Code: 00 00 e8 dd 82 23 ff e8 38 f1 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 79 f2 21 ff 45 84 ff 0f 85 b8 01 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 92 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[ 6882.751411] RSP: 0018:ffff9f05807dfe70 EFLAGS: 00000246
[ 6882.751698] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
[ 6882.751990] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 6882.752281] RBP: ffff908180ec4f68 R08: 0000000000000000 R09: 0000000000000000
[ 6882.752598] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff89ce0900
[ 6882.752989] R13: 00000642833badfd R14: 0000000000000003 R15: 0000000000000000
[ 6882.753373]  cpuidle_enter+0x2d/0x50
[ 6882.753701]  cpuidle_idle_call+0xfd/0x170
[ 6882.754049]  do_idle+0x7b/0xc0
[ 6882.754333]  cpu_startup_entry+0x29/0x30
[ 6882.754597]  start_secondary+0x11e/0x140
[ 6882.754856]  common_startup_64+0x13e/0x141
[ 6882.755114]  </TASK>
[ 6882.755357] Modules linked in: vfio_pci vfio_pci_core vhost_net vhost vhost_iotlb tap xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nf_conntrack_tftp tun nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel bnxt_re iTCO_wdt ipmi_ssif iTCO_vendor_support ib_uverbs kvm pcspkr acpi_ipmi ib_core ipmi_si i2c_i801 lpc_ich i2c_smbus ipmi_devintf ioatdma intel_pch_thermal wmi ipmi_msghandler fuse xfs qla2xxx sd_mod nvme_fc mgag200 sg drm_shmem_helper nvme_fabrics ahci crct10dif_pclmul crc32_pclmul drm_kms_helper nvme libahci nvme_keyring ghash_clmulni_intel i40e drm sha512_ssse3 sha256_ssse3 nvme_core bnxt_en igb libata megaraid_sas scsi_transport_fc sha1_ssse3 nvme_auth
[ 6882.755442]  libie dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod aesni_intel gf128mul crypto_simd cryptd
[ 6882.758069] CR2: ffff9f058cedbb10

I would like to know what’s going on with this issue or if there are any 
workarounds available. Please let me know if further debugging 
logs or tests are needed.

Thanks,
Sherry






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux