On Mon, Sep 02, 2024 at 04:09:33PM +0000, Alasdair McWilliam wrote: > Good evening, > > Looks like commit a62c50545b4d is the culprit. > > I've produced a production-grade build of kernel 6.1.95 with commit > a62c50545b4d backed out. Seems I can no longer trigger the fault. I can > kill -9 the process while pushing 50Gbps / 14Mpps and the process is > just restarted and resumes like it should. > > I'm going to back out the same commit from 6.1.106 for our production > builds and verify that fixes the issue there too. > > Can you advise if this will be reversed in future commits, or if you > have an alternate fix in the wings? We've been working recently on somewhat related issues and it looks like not every commit from [0] has been backported. $ git log --oneline v6.1.103..v6.1.104 drivers/net/ethernet/intel/ice/ 5a80b682e3e1 ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog 8782f0fcb19d ice: replace synchronize_rcu with synchronize_net 15115033f056 ice: don't busy wait for Rx queue disable in ice_qp_dis() 3dbc58774e58 ice: respect netif readiness in AF_XDP ZC related ndo's can you apply the rest of it on top of 6.1.107 and see the result? [0]: https://lore.kernel.org/all/20240729200716.681496-1-anthony.l.nguyen@xxxxxxxxx/ > > Thank you ! :-) > Alasdair > > > ________________________________________ > From: Alasdair McWilliam <alasdair.mcwilliam@xxxxxxxxxxx> > Sent: 27 August 2024 14:33 > To: Maciej Fijalkowski; Magnus Karlsson > Cc: xdp-newbies@xxxxxxxxxxxxxxx > Subject: Re: ICE + XSK ZC - page faults on 6.1 LTS when process exits? > > Hi Maciej, Magnus, > > Apologies for slow reply – bank holiday in the UK yesterday. > > Just a quick update – it’s quicker and easier for me to build a released version of code than it is to build a production kernel from a git tree due to build apparatus. > > Based on the suggestion to back out commit a62c50545b4d, I have taken the first step of identifying that said commit was included in 6.1.95. So, I’ve run both 6.1.95 and 6.1.94 through a build to test both. Some quick and dirty testing shows: > > * I can reproduce the issue on 6.1.95 > * I cannot so far reproduce the issue on 6.1.94 > > I’ve only tested the latter version 3-4 times so I’m going to keep throwing dead processes at it in different ways to just to be sure 6.1.94 is not affected. Then, to validate, I will grab the actual git tree at 6.1.95 and manually back out a62c50545b4d and re-test. But, this will take me a little longer. > > Thanks > Alasdair > > > From: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> > Date: Friday, 23 August 2024 at 15:09 > To: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > Cc: Alasdair McWilliam <alasdair.mcwilliam@xxxxxxxxxxx>, xdp-newbies@xxxxxxxxxxxxxxx <xdp-newbies@xxxxxxxxxxxxxxx> > Subject: Re: ICE + XSK ZC - page faults on 6.1 LTS when process exits? > On Fri, Aug 23, 2024 at 10:17:35AM +0200, Magnus Karlsson wrote: > > On Thu, 22 Aug 2024 at 18:25, Alasdair McWilliam > > <alasdair.mcwilliam@xxxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > I've been testing apps that use XSK+ZC on ICE with newer builds of the 6.1 LTS kernel in preparation for some production upgrades, and I've started to notice some instability on newer versions. I can reproduce the issue easily in the lab. > > > > > > Config: > > > - Known good multi-threaded application (i.e. production grade) > > > - Uses eBPF and AF_XDP with zero copy to act as 'bump in wire' in network > > > - Xeon's with Intel E810-CQDA2 (firmware: 3.20 0x8000d83e 1.3146.0) > > > - Effectively a vanilla rebuild of 6.1 using configs from el-repo project > > > > > > Scenario: > > > - Noticing hard kernel faults when shutting down application > > > - Can happen if the process is shut down via systemctl stop > > > - Can even happen with a simple kill -9 command to the PID > > > - Appears in builds after 6.1.87 > > > > > > Tested kernels: > > > - 6.1.84: process exits smoothly > > > - 6.1.87: process exits smoothly > > > - 6.1.97: BUG: unable to handle page fault for address > > > - 6.1.106: BUG: unable to handle page fault for address > > > > > > Kdump log is below [1] from 6.1.106 but does seem to be the same in the earlier version. > > > > > > Can anyone advise if this is a known issue? > > > > > > I don't have any builds between 6.1.87 and 6.1.97 but I can spend some time trying to pinpoint the exact version things start to go wrong in, if it would help anyone better equipped than me to debug! > > > > Hi Alasdair, > > > > It would be of great help if you could pinpoint the exact version for > > this breakage. Hopefully we could then find the commit in the ice > > driver that breaks your app, since there should be just a handful of > > commits in the ice driver for any stable release. > > $ git log --oneline v6.1.87..v6.1.97 drivers/net/ethernet/intel/ice/ > dd37b86999fd ice: Fix VSI list rule with ICE_SW_LKUP_LAST type > 224b69e8751c ice: avoid IRQ collision to fix init failure on ACPI S3 resume > 531d85b4fb66 ice: move RDMA init to ice_idc.c > a62c50545b4d ice: remove af_xdp_zc_qps bitmap > 447a5433bd1e ice: remove null checks before devm_kfree() calls > a388961be5ed ice: Introduce new parameters in ice_sched_node > 17ccdebe5ac7 ice: fix iteration of TLVs in Preserved Fields Area > 07cbc5512023 ice: fix accounting if a VLAN already exists > 5ef3a27c6142 ice: Interpret .set_channels() input differently > 90cbd4c081bb ice: remove unnecessary duplicate checks for VF VSI ID > 59161a21cae0 ice: pass VSI pointer into ice_vc_isvalid_q_id > 6a6ebec40820 ice: tc: allow zero flags in parsing tc flower > > can you revert a62c50545b4d and see if the issue persists? > > > > > > Kind regards > > > Alasdair > > > > > > [1] kdump log > > > > > > [ 158.666867] BUG: unable to handle page fault for address: ffffa6510e5580c0 > > > [ 158.666887] #PF: supervisor read access in kernel mode > > > [ 158.666896] #PF: error_code(0x0000) - not-present page > > > [ 158.666903] PGD 100000067 P4D 100000067 PUD 106dc4067 PMD 0 > > > [ 158.666914] Oops: 0000 [#1] PREEMPT SMP PTI > > > [ 158.666922] CPU: 7 PID: 1808 Comm: tlndd.bin Kdump: loaded Tainted: G E 6.1.106-1.X.el9.x86_64 #1 > > > [ 158.666940] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS 3.2 12/16/2019 > > > [ 158.666950] RIP: 0010:xp_free+0x11/0x80 > > > [ 158.666962] Code: 8b 04 d0 48 83 e0 fe 48 01 f0 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 41 54 55 48 8d 6f 58 53 <48> 8b 47 58 48 39 c5 74 0d 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc > > > [ 158.666985] RSP: 0018:ffffa65089e8b760 EFLAGS: 00010202 > > > [ 158.666993] RAX: ffff8fcf077c0000 RBX: 0000000000000001 RCX: 0000000000000000 > > > [ 158.667003] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa6510e558068 > > > [ 158.667012] RBP: ffffa6510e5580c0 R08: fffff8c50415a108 R09: ffff8fc7cac60000 > > > [ 158.667022] R10: 0000000000000219 R11: ffffffffffffffff R12: 0000000000000fff > > > [ 158.667031] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8fc7c139d340 > > > [ 158.667040] FS: 00007f8504996880(0000) GS:ffff8fcedfdc0000(0000) knlGS:0000000000000000 > > > [ 158.667050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 158.667058] CR2: ffffa6510e5580c0 CR3: 00000001448e2002 CR4: 00000000001706e0 > > > [ 158.667068] Call Trace: > > > [ 158.667075] <TASK> > > > [ 158.667082] ? show_trace_log_lvl+0x1c4/0x2df > > > [ 158.667094] ? show_trace_log_lvl+0x1c4/0x2df > > > [ 158.667103] ? ice_xsk_clean_rx_ring+0x39/0x60 [ice] > > > [ 158.667157] ? __die_body.cold+0x8/0xd > > > [ 158.667166] ? page_fault_oops+0xac/0x150 > > > [ 158.667176] ? fixup_exception+0x22/0x340 > > > [ 158.667185] ? exc_page_fault+0xb2/0x150 > > > [ 158.667195] ? asm_exc_page_fault+0x22/0x30 > > > [ 158.667206] ? xp_free+0x11/0x80 > > > [ 158.667215] ice_xsk_clean_rx_ring+0x39/0x60 [ice] > > > [ 158.667250] ice_clean_rx_ring+0x157/0x180 [ice] > > > [ 158.667284] ice_down+0x172/0x2b0 [ice] > > > [ 158.667311] ? ice_xdp_setup_prog+0x3b0/0x3b0 [ice] > > > [ 158.667337] ice_xdp_setup_prog+0xe3/0x3b0 [ice] > > > [ 158.667364] ? ice_xdp_setup_prog+0x3b0/0x3b0 [ice] > > > [ 158.667391] dev_xdp_install+0xc7/0x100 > > > [ 158.667402] dev_xdp_attach+0x1e0/0x560 > > > [ 158.667412] do_setlink+0x7a8/0xc10 > > > [ 158.667422] ? __nla_validate_parse+0x12b/0x1b0 > > > [ 158.667436] __rtnl_newlink+0x540/0x650 > > > [ 158.667446] rtnl_newlink+0x44/0x70 > > > [ 158.667454] rtnetlink_rcv_msg+0x15c/0x3d0 > > > [ 158.667477] ? rtnl_calcit.isra.0+0x140/0x140 > > > [ 158.667485] netlink_rcv_skb+0x51/0x100 > > > [ 158.667727] netlink_unicast+0x246/0x360 > > > [ 158.667953] netlink_sendmsg+0x24e/0x4b0 > > > [ 158.668173] __sock_sendmsg+0x62/0x70 > > > [ 158.668389] ____sys_sendmsg+0x247/0x2d0 > > > [ 158.668602] ? copy_msghdr_from_user+0x6d/0xa0 > > > [ 158.668815] ___sys_sendmsg+0x88/0xd0 > > > [ 158.669028] ? __sk_destruct+0x156/0x230 > > > [ 158.669234] ? kmem_cache_free+0x134/0x300 > > > [ 158.669437] ? rcu_nocb_try_bypass+0x4a/0x440 > > > [ 158.669634] ? __sk_destruct+0x156/0x230 > > > [ 158.669825] ? _raw_spin_unlock_irqrestore+0x23/0x40 > > > [ 158.670010] ? mod_objcg_state+0xc9/0x2f0 > > > [ 158.670186] ? refill_obj_stock+0xae/0x160 > > > [ 158.670359] ? rseq_get_rseq_cs.isra.0+0x16/0x220 > > > [ 158.670529] ? rcu_nocb_try_bypass+0x4a/0x440 > > > [ 158.670696] ? rseq_ip_fixup+0x72/0x1e0 > > > [ 158.670860] __sys_sendmsg+0x59/0xa0 > > > [ 158.671021] ? syscall_trace_enter.constprop.0+0x11e/0x190 > > > [ 158.671185] do_syscall_64+0x35/0x80 > > > [ 158.671345] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > > [ 158.671503] RIP: 0033:0x7f850510f917 > > > [ 158.671658] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 > > > [ 158.671993] RSP: 002b:00007ffcc805f238 EFLAGS: 00000246 ORIG_RAX: 000000000000002e > > > [ 158.672171] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f850510f917 > > > [ 158.672352] RDX: 0000000000000000 RSI: 000000000198e9e8 RDI: 0000000000000009 > > > [ 158.672534] RBP: 0000000001933c00 R08: 0000000001935980 R09: 0000000000460e48 > > > [ 158.672716] R10: 0000000000000011 R11: 0000000000000246 R12: 0000000001933c30 > > > [ 158.672899] R13: 0000000000515fd8 R14: 000000000198e9d0 R15: 0000000000513690 > > > [ 158.673086] </TASK> > > > [ 158.673269] Modules linked in: bonding(E) tls(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E) nf_tables(E) libcrc32c(E) nfnetlink(E) vfat(E) fat(E) ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) kvm(E) irqbypass(E) rapl(E) intel_cstate(E) ast(E) intel_uncore(E) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) pcspkr(E) mei_me(E) drm_kms_helper(E) i2c_i801(E) lpc_ich(E) mei(E) i2c_smbus(E) mxm_wmi(E) ioatdma(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) acpi_power_meter(E) joydev(E) drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) ahci(E) crct10dif_pclmul(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) ice(E) > > > [ 158.673314] polyval_clmulni(E) polyval_generic(E) igb(E) libata(E) ghash_clmulni_intel(E) i2c_algo_bit(E) dca(E) wmi(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > > > [ 158.675578] CR2: ffffa6510e5580c0