Re: When upgrading to kernel 6.12.4, I got the following error with rdma_rxe module.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2024/12/25 21:04, Yiyi Hu 写道:
Hi, I'm using 6.6.x kernel, after I upgraded to kernel 6.12.x for my
home server, I got the following error. Which causes soft rdma doesn't
work for 6.12.x with my setup.

I read this bug descriptions for several times. To be honest, I am also confused with this problem.

It seems that with 6.6.x kernel, it can work well. And with 6.12.x, it can not work.

So maybe "git bisect" is a good tool to find the commit that causes this problems.

Zhu Yanjun


The kernel warning is:

# [  134.041117] ------------[ cut here ]------------
# [  134.041120] WARNING: CPU: 9 PID: 0 at
drivers/infiniband/sw/rxe/rxe_net.c:357 rxe_skb_tx_dtor+0xbc/0x110
[rdma_rxe]
# [  134.041128] Modules linked in: dm_cache_smq dm_cache
dm_persistent_data dm_bio_prison 8021q garp mrp macvtap macvlan veth
dummy bridge stp llc nft_ct nf_tables sch_fq_codel virtio_vdpa vduse
vdpa rdma_rxe ip6_udp_tunnel udp_tunnel nvme_rdma nvmet_rdma raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq raid1 raid0 bcache nvmet nvme_auth dm_writecache msr thermal
rpcrdma rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib iw_cm ib_cm
rt2800usb rt2x00usb rt2800lib rt2x00lib ee1004 wmi_bmof mxm_wmi evdev
mac80211 snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_scodec_component cfg80211 rfkill snd_hda_intel
snd_intel_dspcfg nf_conntrack_tftp snd_intel_sdw_acpi
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp
nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
snd_hda_codec nct6775 snd_hda_core nct6775_core snd_hwdep hwmon_vid
vhost_net snd_pcm mlx4_ib tun kvm_amd snd_timer vhost gpio_amdpt
ib_uverbs vhost_iotlb i2c_piix4 snd tap rapl pcspkr soundcore
i2c_smbus rtc_cmos wmi
# [  134.041242]  button gpio_generic ib_core kvm k10temp drm fuse
loop dmi_sysfs dm_snapshot dm_bufio mlx4_en sd_mod nvme_tcp
nvme_fabrics nvme ixgbe crct10dif_pclmul mdio crc32_pclmul mdio_devres
crc32c_intel igb ghash_clmulni_intel i2c_algo_bit sha512_ssse3 libphy
nvme_core sha256_ssse3 ptp sha1_ssse3 hwmon xhci_pci aesni_intel ahci
mlx4_core crypto_simd libahci xhci_hcd libata i2c_core sunrpc
dm_mirror dm_region_hash dm_log be2iscsi iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi dm_multipath dm_mod efivarfs
# [  134.041324] CPU: 9 UID: 0 PID: 0 Comm: swapper/9 Tainted: G
  W          6.12.4-gentoo #2
# [  134.041329] Tainted: [W]=WARN
# [  134.041331] Hardware name: To Be Filled By O.E.M. X370 Killer
SLI/X370 Killer SLI, BIOS P10.31 08/23/2024
# [  134.041335] RIP: 0010:rxe_skb_tx_dtor+0xbc/0x110 [rdma_rxe]
# [  134.041342] Code: 00 f0 0f c1 87 80 00 00 00 83 f8 01 74 39 85 c0
7f 96 5b 5d be 03 00 00 00 48 89 d7 41 5c e9 7b 43 39 e0 41 f6 04 24
01 75 26 <0f> 0b 5b 5d 41 5c c3 ff c8 83 f8 0f 77 a4 49 8d bc 24 d8 03
00 00
# [  134.041347] RSP: 0018:ffffc90000338d20 EFLAGS: 00010246
# [  134.041352] RAX: 0000000000000000 RBX: ffff888101b38200 RCX:
0000000000000000
# [  134.041355] RDX: ffff88810092de80 RSI: 000000000000000e RDI:
ffff88810092de80
# [  134.041358] RBP: 0000000000000000 R08: ffff8881253b0000 R09:
0000000000000001
# [  134.041360] R10: ffff8881251fac00 R11: 0000000000000080 R12:
ffff888125220000
# [  134.041364] R13: 0000000000000056 R14: 0000000000010000 R15:
ffff888101b38200
# [  134.041367] FS:  0000000000000000(0000) GS:ffff889fbee40000(0000)
knlGS:0000000000000000
# [  134.041370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
# [  134.041373] CR2: 000055ec3e726750 CR3: 000000000341b000 CR4:
0000000000350ef0
# [  134.041376] Call Trace:
# [  134.041379]  <IRQ>
# [  134.041381]  ? rxe_skb_tx_dtor+0xbc/0x110 [rdma_rxe]
# [  134.041388]  ? __warn.cold+0x93/0xed
# [  134.041392]  ? rxe_skb_tx_dtor+0xbc/0x110 [rdma_rxe]
# [  134.041398]  ? report_bug+0xe2/0x170
# [  134.041402]  ? handle_bug+0x4f/0x90
# [  134.041405]  ? exc_invalid_op+0x13/0x60
# [  134.041409]  ? asm_exc_invalid_op+0x16/0x20
# [  134.041413]  ? rxe_skb_tx_dtor+0xbc/0x110 [rdma_rxe]
# [  134.041420]  skb_release_head_state+0x20/0x90
# [  134.041424]  napi_consume_skb+0x3e/0x100
# [  134.041428]  mlx4_en_free_tx_desc+0x11d/0x200 [mlx4_en]
# [  134.041433]  mlx4_en_process_tx_cq+0x182/0x560 [mlx4_en]
# [  134.041439]  mlx4_en_poll_tx_cq+0x28/0x70 [mlx4_en]
# [  134.041443]  __napi_poll+0x25/0x150
# [  134.041447]  net_rx_action+0x300/0x370
# [  134.041452]  ? mlx4_msi_x_interrupt+0xd/0x20 [mlx4_core]
# [  134.041465]  handle_softirqs+0xf0/0x2a0
# [  134.041469]  irq_exit_rcu+0x74/0xd0
# [  134.041473]  common_interrupt+0xb8/0xd0
# [  134.041476]  </IRQ>
# [  134.041478]  <TASK>
# [  134.041480]  asm_common_interrupt+0x22/0x40
# [  134.041484] RIP: 0010:cpuidle_enter_state+0xaf/0x410
# [  134.041488] Code: de 01 00 00 e8 d2 9c 5f ff e8 dd f2 ff ff 49 89
c5 0f 1f 44 00 00 31 ff e8 9e bd 5e ff 45 84 ff 0f 85 b0 01 00 00 fb
45 85 f6 <0f> 88 88 01 00 00 48 8b 04 24 49 63 ce 4c 89 ea 48 6b f1 68
48 29
# [  134.041492] RSP: 0018:ffffc9000013be78 EFLAGS: 00000202
# [  134.041496] RAX: ffff889fbee40000 RBX: ffff888104c68400 RCX:
0000000000000000
# [  134.041500] RDX: 0000001f357196a8 RSI: ffffffff82142fc3 RDI:
ffffffff82146c8f
# [  134.041503] RBP: 0000000000000002 R08: 0000000000000002 R09:
0000000000000000
# [  134.041506] R10: 0000000000000018 R11: ffff889fbee72144 R12:
ffffffff824dfd40
# [  134.041509] R13: 0000001f357196a8 R14: 0000000000000002 R15:
0000000000000000
# [  134.041513]  ? cpuidle_enter_state+0xa2/0x410
# [  134.041517]  cpuidle_enter+0x29/0x40
# [  134.041520]  do_idle+0x19a/0x200
# [  134.041524]  cpu_startup_entry+0x25/0x30
# [  134.041527]  start_secondary+0xfe/0x120
# [  134.041530]  common_startup_64+0x13e/0x141
# [  134.041535]  </TASK>
# [  134.041537] ---[ end trace 0000000000000000 ]---

the system has a "Hewlett-Packard Company InfiniBand FDR/Ethernet
10Gb/40Gb 2-port 544+FLR-QSFP Adapter"
I created a software bridge named br-fp and bridge 2 ports on this adapter,
I also created a veth pair (venfp <-> venfpbr), venfpbr is connected
to bridge br-fp.
then I do 'rdma link add rxe_venfp type rxe netdev venfp', If I try to
do rdma connect to remote storge, the error occurs.

if you need other info, Please tell me.

Thanks.





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux