On Thu, 22 Aug 2024 at 18:25, Alasdair McWilliam <alasdair.mcwilliam@xxxxxxxxxxx> wrote: > > Hi, > > I've been testing apps that use XSK+ZC on ICE with newer builds of the 6.1 LTS kernel in preparation for some production upgrades, and I've started to notice some instability on newer versions. I can reproduce the issue easily in the lab. > > Config: > - Known good multi-threaded application (i.e. production grade) > - Uses eBPF and AF_XDP with zero copy to act as 'bump in wire' in network > - Xeon's with Intel E810-CQDA2 (firmware: 3.20 0x8000d83e 1.3146.0) > - Effectively a vanilla rebuild of 6.1 using configs from el-repo project > > Scenario: > - Noticing hard kernel faults when shutting down application > - Can happen if the process is shut down via systemctl stop > - Can even happen with a simple kill -9 command to the PID > - Appears in builds after 6.1.87 > > Tested kernels: > - 6.1.84: process exits smoothly > - 6.1.87: process exits smoothly > - 6.1.97: BUG: unable to handle page fault for address > - 6.1.106: BUG: unable to handle page fault for address > > Kdump log is below [1] from 6.1.106 but does seem to be the same in the earlier version. > > Can anyone advise if this is a known issue? > > I don't have any builds between 6.1.87 and 6.1.97 but I can spend some time trying to pinpoint the exact version things start to go wrong in, if it would help anyone better equipped than me to debug! Hi Alasdair, It would be of great help if you could pinpoint the exact version for this breakage. Hopefully we could then find the commit in the ice driver that breaks your app, since there should be just a handful of commits in the ice driver for any stable release. > Kind regards > Alasdair > > [1] kdump log > > [ 158.666867] BUG: unable to handle page fault for address: ffffa6510e5580c0 > [ 158.666887] #PF: supervisor read access in kernel mode > [ 158.666896] #PF: error_code(0x0000) - not-present page > [ 158.666903] PGD 100000067 P4D 100000067 PUD 106dc4067 PMD 0 > [ 158.666914] Oops: 0000 [#1] PREEMPT SMP PTI > [ 158.666922] CPU: 7 PID: 1808 Comm: tlndd.bin Kdump: loaded Tainted: G E 6.1.106-1.X.el9.x86_64 #1 > [ 158.666940] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS 3.2 12/16/2019 > [ 158.666950] RIP: 0010:xp_free+0x11/0x80 > [ 158.666962] Code: 8b 04 d0 48 83 e0 fe 48 01 f0 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 41 54 55 48 8d 6f 58 53 <48> 8b 47 58 48 39 c5 74 0d 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc > [ 158.666985] RSP: 0018:ffffa65089e8b760 EFLAGS: 00010202 > [ 158.666993] RAX: ffff8fcf077c0000 RBX: 0000000000000001 RCX: 0000000000000000 > [ 158.667003] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa6510e558068 > [ 158.667012] RBP: ffffa6510e5580c0 R08: fffff8c50415a108 R09: ffff8fc7cac60000 > [ 158.667022] R10: 0000000000000219 R11: ffffffffffffffff R12: 0000000000000fff > [ 158.667031] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8fc7c139d340 > [ 158.667040] FS: 00007f8504996880(0000) GS:ffff8fcedfdc0000(0000) knlGS:0000000000000000 > [ 158.667050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 158.667058] CR2: ffffa6510e5580c0 CR3: 00000001448e2002 CR4: 00000000001706e0 > [ 158.667068] Call Trace: > [ 158.667075] <TASK> > [ 158.667082] ? show_trace_log_lvl+0x1c4/0x2df > [ 158.667094] ? show_trace_log_lvl+0x1c4/0x2df > [ 158.667103] ? ice_xsk_clean_rx_ring+0x39/0x60 [ice] > [ 158.667157] ? __die_body.cold+0x8/0xd > [ 158.667166] ? page_fault_oops+0xac/0x150 > [ 158.667176] ? fixup_exception+0x22/0x340 > [ 158.667185] ? exc_page_fault+0xb2/0x150 > [ 158.667195] ? asm_exc_page_fault+0x22/0x30 > [ 158.667206] ? xp_free+0x11/0x80 > [ 158.667215] ice_xsk_clean_rx_ring+0x39/0x60 [ice] > [ 158.667250] ice_clean_rx_ring+0x157/0x180 [ice] > [ 158.667284] ice_down+0x172/0x2b0 [ice] > [ 158.667311] ? ice_xdp_setup_prog+0x3b0/0x3b0 [ice] > [ 158.667337] ice_xdp_setup_prog+0xe3/0x3b0 [ice] > [ 158.667364] ? ice_xdp_setup_prog+0x3b0/0x3b0 [ice] > [ 158.667391] dev_xdp_install+0xc7/0x100 > [ 158.667402] dev_xdp_attach+0x1e0/0x560 > [ 158.667412] do_setlink+0x7a8/0xc10 > [ 158.667422] ? __nla_validate_parse+0x12b/0x1b0 > [ 158.667436] __rtnl_newlink+0x540/0x650 > [ 158.667446] rtnl_newlink+0x44/0x70 > [ 158.667454] rtnetlink_rcv_msg+0x15c/0x3d0 > [ 158.667477] ? rtnl_calcit.isra.0+0x140/0x140 > [ 158.667485] netlink_rcv_skb+0x51/0x100 > [ 158.667727] netlink_unicast+0x246/0x360 > [ 158.667953] netlink_sendmsg+0x24e/0x4b0 > [ 158.668173] __sock_sendmsg+0x62/0x70 > [ 158.668389] ____sys_sendmsg+0x247/0x2d0 > [ 158.668602] ? copy_msghdr_from_user+0x6d/0xa0 > [ 158.668815] ___sys_sendmsg+0x88/0xd0 > [ 158.669028] ? __sk_destruct+0x156/0x230 > [ 158.669234] ? kmem_cache_free+0x134/0x300 > [ 158.669437] ? rcu_nocb_try_bypass+0x4a/0x440 > [ 158.669634] ? __sk_destruct+0x156/0x230 > [ 158.669825] ? _raw_spin_unlock_irqrestore+0x23/0x40 > [ 158.670010] ? mod_objcg_state+0xc9/0x2f0 > [ 158.670186] ? refill_obj_stock+0xae/0x160 > [ 158.670359] ? rseq_get_rseq_cs.isra.0+0x16/0x220 > [ 158.670529] ? rcu_nocb_try_bypass+0x4a/0x440 > [ 158.670696] ? rseq_ip_fixup+0x72/0x1e0 > [ 158.670860] __sys_sendmsg+0x59/0xa0 > [ 158.671021] ? syscall_trace_enter.constprop.0+0x11e/0x190 > [ 158.671185] do_syscall_64+0x35/0x80 > [ 158.671345] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > [ 158.671503] RIP: 0033:0x7f850510f917 > [ 158.671658] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 > [ 158.671993] RSP: 002b:00007ffcc805f238 EFLAGS: 00000246 ORIG_RAX: 000000000000002e > [ 158.672171] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f850510f917 > [ 158.672352] RDX: 0000000000000000 RSI: 000000000198e9e8 RDI: 0000000000000009 > [ 158.672534] RBP: 0000000001933c00 R08: 0000000001935980 R09: 0000000000460e48 > [ 158.672716] R10: 0000000000000011 R11: 0000000000000246 R12: 0000000001933c30 > [ 158.672899] R13: 0000000000515fd8 R14: 000000000198e9d0 R15: 0000000000513690 > [ 158.673086] </TASK> > [ 158.673269] Modules linked in: bonding(E) tls(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E) nf_tables(E) libcrc32c(E) nfnetlink(E) vfat(E) fat(E) ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) kvm(E) irqbypass(E) rapl(E) intel_cstate(E) ast(E) intel_uncore(E) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) pcspkr(E) mei_me(E) drm_kms_helper(E) i2c_i801(E) lpc_ich(E) mei(E) i2c_smbus(E) mxm_wmi(E) ioatdma(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) acpi_power_meter(E) joydev(E) drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) ahci(E) crct10dif_pclmul(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) ice(E) > [ 158.673314] polyval_clmulni(E) polyval_generic(E) igb(E) libata(E) ghash_clmulni_intel(E) i2c_algo_bit(E) dca(E) wmi(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > [ 158.675578] CR2: ffffa6510e5580c0