On Mon, Nov 04, 2024 at 12:18:07PM +0000, Alasdair McWilliam wrote: > On 04/11/2024 07:11, Larysa Zaremba wrote: > > >> It's been a minute since I've looked at this due to other commitments > >> but accidentally bumped into the fault again when testing the latest 6.6 > >> LTS for a new feature of our software. (I forgot to revert the commit > >> for "ice: remove af_xdp_zc_qps bitmap" in our build system.) > >> > >> This led me to wonder about the current version, and can trigger the > >> same crash on 6.11.5 [3]. > >> > >> Reverting "ice: remove af_xdp_zc_qps bitmap" [1] in the current mainline > >> is a little more complicated as commit ebc33a3f8d0a ("ice: improve > >> updating ice_{t,r}x_ring::xsk_pool") also changes things a little so the > >> reversion doesn't work cleanly. > >> > >> I have tweaked everything a little the below patch [2] applies cleanly > >> to 6.11.5 and 6.12-rc5 and seems to fix the fault. > >> > >> Thought I'd bubble this up as it's definitely still an issue in the > >> mainline kernel as of now. > >> > >> Thanks > >> Alasdair > >> > > > > Hello, > > Could you please share the reproduction steps? I will look into this. > > Hello, > > I should probably have provided better steps to reproduce - apologies. > > Our stack uses AF_XDP in zero copy mode with shared UMEM between XSK > sockets. Thanks! Just letting you and anyone interested know that I was able to reliably reproduce the issue and have found the root cause. Hopefully, will be able to send the exact fix soon. > > To isolate other bugs in the past we've used a modified xdpsock app > based on code previously in kernel samples. The original sample has > since been taken out the kernel repo, but we maintained the modified > version in our public repos here [1]. > > There's lots in the readme but suffice to say if you run the build.sh > with bash, it will compile xdpsock_multi user-space app and accompanying > xdpsock_multi.bpf eBPF app. You'll also need to necessary dependencies > libxdp/libbpf et al. > > I can reproduce the issue with this app using 8 channels. It can fault > in two ways (step C or D) below. > > Terminal 1: > > A# ethtool -L <nic> combined 8 > B# ./xdpsock_multi --l2fwd --interface ice1_1 --zero-copy --channels 8 > > Terminal 2: > > C# kill -9 $(pidof xdpsock_multi) > D# ip link set dev <nic> xdp off > > Sometimes the act of killing the process (step C) causes a kernel crash [2]. > > Other times, it may survive, leaving an orphaned XDP program attached to > the NIC. Unloading this manually (step D) causes a kernel crash [3]. > > Stack traces are actually different so hence I've provided both. > > Affects: > 6.1.x > 6.6.x > 6.11.x > > Hardware is E810-CQDA2 > Firmware is 3.20 0x8000d83e 1.3146.0 > > Let me know if you need anything further. > > Thanks! > Alasdair > > > [1] https://github.com/OpenSource-THG/xdpsock-sample > > [2] Kernel crash triggered by step C > > [ 220.921136] BUG: unable to handle page fault for address: > ffffa3eee1637f14 > [ 220.921175] #PF: supervisor write access in kernel mode > [ 220.921196] #PF: error_code(0x0002) - not-present page > [ 220.921217] PGD 100000067 P4D 100000067 PUD 100238067 PMD 0 > [ 220.921244] Oops: Oops: 0002 [#1] PREEMPT SMP PTI > [ 220.921267] CPU: 5 UID: 0 PID: 0 Comm: swapper/5 Kdump: loaded > Tainted: G E > 6.11.5-1.thg.836e8867d7.241031.135507.el9.x86_64 #1 > [ 220.921315] Tainted: [E]=UNSIGNED_MODULE > [ 220.921331] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS > 3.2 12/16/2019 > [ 220.921357] RIP: 0010:ice_clean_rx_irq_zc+0xde/0x7d0 [ice] > [ 220.921489] Code: 0f 84 d0 01 00 00 44 3b 7c 24 08 0f 84 a1 02 00 00 > 48 8b 53 38 41 0f b7 4d 04 4c 8b 24 c2 89 c8 81 e1 ff 3f 00 00 66 25 ff > 3f <41> c7 44 24 34 00 00 00 00 49 8b 74 24 18 48 8d 96 00 01 00 00 49 > [ 220.921518] RSP: 0018:ffffa3eec64d0d88 EFLAGS: 00010206 > [ 220.921529] RAX: 000000000000014d RBX: ffff89bbc2aa2a00 RCX: > 000000000000014d > [ 220.921542] RDX: ffff89b408830000 RSI: 0000000000000040 RDI: > ffff89bbc2aa2a00 > [ 220.921554] RBP: 0000000000000000 R08: 0000000000000000 R09: > ffff89b407655000 > [ 220.921566] R10: 0000ffffffffffff R11: ffffa3eec64d0ff8 R12: > ffffa3eee1637ee0 > [ 220.921578] R13: ffff89b414710000 R14: ffff89bbc7919500 R15: > 0000000000000000 > [ 220.921591] FS: 0000000000000000(0000) GS:ffff89bb5fc80000(0000) > knlGS:0000000000000000 > [ 220.921605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 220.921616] CR2: ffffa3eee1637f14 CR3: 00000001d9820006 CR4: > 00000000001706f0 > [ 220.921628] Call Trace: > [ 220.921639] <IRQ> > [ 220.921647] ? __die+0x20/0x70 > [ 220.921663] ? page_fault_oops+0x80/0x150 > [ 220.921676] ? exc_page_fault+0xcd/0x170 > [ 220.921690] ? asm_exc_page_fault+0x22/0x30 > [ 220.921707] ? ice_clean_rx_irq_zc+0xde/0x7d0 [ice] > [ 220.921759] ? ice_clean_tx_irq+0x166/0x3c0 [ice] > [ 220.921808] ice_napi_poll+0xb2/0x2a0 [ice] > [ 220.921858] __napi_poll+0x2c/0x1b0 > [ 220.921870] net_rx_action+0x30d/0x3e0 > [ 220.921881] ? __raise_softirq_irqoff+0x18/0x80 > [ 220.921896] ? __napi_schedule+0xa6/0xc0 > [ 220.921907] ? ice_msix_clean_rings+0x4f/0x60 [ice] > [ 220.921959] handle_softirqs+0xf0/0x2e0 > [ 220.921972] __irq_exit_rcu+0x80/0xe0 > [ 220.921983] common_interrupt+0xb7/0xd0 > [ 220.921995] </IRQ> > [ 220.922001] <TASK> > [ 220.922008] asm_common_interrupt+0x22/0x40 > [ 220.922022] RIP: 0010:cpuidle_enter_state+0xc8/0x420 > [ 220.922034] Code: 0e b6 3e ff e8 09 ee ff ff 8b 55 04 49 89 c5 0f 1f > 44 00 00 31 ff e8 97 69 3d ff 45 84 ff 0f 85 38 02 00 00 fb 0f 1f 44 00 > 00 <45> 85 f6 0f 88 6a 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d > [ 220.922061] RSP: 0018:ffffa3eec4377e78 EFLAGS: 00000246 > [ 220.922072] RAX: ffff89bb5fc80000 RBX: 0000000000000004 RCX: > 000000000000001f > [ 220.922085] RDX: 0000000000000005 RSI: ffffffffb255a8a3 RDI: > ffffffffb2533173 > [ 220.922098] RBP: ffff89bb5fcc0cc8 R08: 000000336fecb8ce R09: > 0000000000000018 > [ 220.922109] R10: 000000000000453f R11: ffff89bb5fcb47e4 R12: > ffffffffb32bdce0 > [ 220.922121] R13: 000000336fecb8ce R14: 0000000000000004 R15: > 0000000000000000 > [ 220.922135] ? cpuidle_enter_state+0xb9/0x420 > [ 220.922147] cpuidle_enter+0x29/0x40 > [ 220.922161] cpuidle_idle_call+0x100/0x170 > [ 220.922175] do_idle+0x7d/0xd0 > [ 220.922185] cpu_startup_entry+0x25/0x30 > [ 220.922195] start_secondary+0x116/0x140 > [ 220.922206] common_startup_64+0x13e/0x141 > [ 220.922222] </TASK> > [ 220.922229] Modules linked in: bonding(E) tls(E) nft_fib_inet(E) > nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) > nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) > nft_chain_nat(E) nf_nat(E) nf_conntr > ack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E) > nf_tables(E) libcrc32c(E) nfnetlink(E) vfat(E) fat(E) intel_rapl_msr(E) > intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E) > intel_powerclamp(E) coretemp(E) kvm > _intel(E) ipmi_ssif(E) kvm(E) iTCO_wdt(E) intel_pmc_bxt(E) > iTCO_vendor_support(E) rapl(E) intel_cstate(E) ast(E) intel_uncore(E) > drm_shmem_helper(E) pcspkr(E) drm_kms_helper(E) i2c_i801(E) mei_me(E) > i2c_mux(E) mxm_wmi(E) mei(E > ) i2c_smbus(E) lpc_ich(E) ioatdma(E) acpi_power_meter(E) ipmi_si(E) > acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) joydev(E) acpi_pad(E) > drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) sg(E) ice(E) ahci(E) > crct10dif_pclmu > l(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) polyval_clmulni(E) > polyval_generic(E) igb(E) libata(E) ghash_clmulni_intel(E) > [ 220.922280] i2c_algo_bit(E) dca(E) libie(E) wmi(E) dm_mirror(E) > dm_region_hash(E) dm_log(E) dm_mod(E) > [ 220.922416] CR2: ffffa3eee1637f14 > > [3] Kernel crash triggered by step D > > [ 894.619896] BUG: unable to handle page fault for address: > ffffb5818c2d7f14 > [ 894.619921] #PF: supervisor read access in kernel mode > [ 894.619932] #PF: error_code(0x0000) - not-present page > [ 894.619942] PGD 100000067 P4D 100000067 PUD 100237067 PMD 0 > [ 894.619957] Oops: Oops: 0000 [#1] PREEMPT SMP PTI > [ 894.619970] CPU: 5 UID: 0 PID: 2540 Comm: ip Kdump: loaded Tainted: G > E 6.11.5-1.thg.836e8867d7.241031.135507.el9.x86_64 #1 > [ 894.619994] Tainted: [E]=UNSIGNED_MODULE > [ 894.620002] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS > 3.2 12/16/2019 > [ 894.620014] RIP: 0010:ice_xsk_clean_rx_ring+0x37/0x110 [ice] > [ 894.620086] Code: 55 53 48 83 ec 08 44 0f b7 af a4 00 00 00 0f b7 af > a2 00 00 00 66 41 39 ed 74 33 48 89 fb 48 8b 4b 38 41 0f b7 c5 4c 8b 34 > c1 <41> f6 46 34 01 75 30 4c 89 f7 41 83 c5 01 e8 f6 5c c6 da 31 c0 66 > [ 894.620113] RSP: 0018:ffffb58189c376d8 EFLAGS: 00010293 > [ 894.620124] RAX: 0000000000000000 RBX: ffff92f681f6b800 RCX: > ffff9302f2860000 > [ 894.620136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > ffff92f681f6b800 > [ 894.620148] RBP: 00000000000007ff R08: 000000000000081f R09: > 0000000000000000 > [ 894.620159] R10: ffff92f684dc0000 R11: 0000000000000020 R12: > 0000000000000010 > [ 894.620171] R13: 0000000000000000 R14: ffffb5818c2d7ee0 R15: > ffff92f681fcd740 > [ 894.620183] FS: 00007f7ee9e27740(0000) GS:ffff92fd9fc80000(0000) > knlGS:0000000000000000 > [ 894.620196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 894.620206] CR2: ffffb5818c2d7f14 CR3: 000000010e25e003 CR4: > 00000000001706f0 > [ 894.620218] Call Trace: > [ 894.620228] <TASK> > [ 894.620236] ? __die+0x20/0x70 > [ 894.620254] ? page_fault_oops+0x80/0x150 > [ 894.620268] ? exc_page_fault+0xcd/0x170 > [ 894.620283] ? asm_exc_page_fault+0x22/0x30 > [ 894.620298] ? ice_xsk_clean_rx_ring+0x37/0x110 [ice] > [ 894.620350] ice_clean_rx_ring+0x16e/0x190 [ice] > [ 894.620401] ice_down+0x2f8/0x3c0 [ice] > [ 894.620443] ice_xdp_setup_prog+0x193/0x460 [ice] > [ 894.620485] ice_xdp+0x7a/0xb0 [ice] > [ 894.620527] ? __pfx_ice_xdp+0x10/0x10 [ice] > [ 894.620567] dev_xdp_install+0xc7/0x100 > [ 894.620584] dev_xdp_attach+0x205/0x5d0 > [ 894.620597] do_setlink+0x7d3/0xc20 > [ 894.620611] ? __nla_validate_parse+0x125/0x1d0 > [ 894.620626] __rtnl_newlink+0x4f7/0x630 > [ 894.620639] ? __kmalloc_cache_noprof+0x225/0x2b0 > [ 894.620652] rtnl_newlink+0x44/0x70 > [ 894.620662] rtnetlink_rcv_msg+0x15c/0x410 > [ 894.620676] ? __rmqueue_pcplist+0x5f/0x2c0 > [ 894.620686] ? __rmqueue_pcplist+0x5f/0x2c0 > [ 894.620695] ? avc_has_perm_noaudit+0x67/0xf0 > [ 894.620708] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 > [ 894.620721] netlink_rcv_skb+0x57/0x100 > [ 894.620735] netlink_unicast+0x246/0x370 > [ 894.620747] netlink_sendmsg+0x1f6/0x430 > [ 894.620758] ____sys_sendmsg+0x3be/0x3f0 > [ 894.620771] ? import_iovec+0x16/0x20 > [ 894.620783] ? copy_msghdr_from_user+0x6d/0xa0 > [ 894.620795] ___sys_sendmsg+0x88/0xd0 > [ 894.620807] ? __mod_memcg_lruvec_state+0xce/0x1c0 > [ 894.620822] ? mod_objcg_state+0xc9/0x2f0 > [ 894.620833] __sys_sendmsg+0x59/0xa0 > [ 894.620844] ? syscall_trace_enter+0xfb/0x190 > [ 894.620856] do_syscall_64+0x60/0x180 > [ 894.620867] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 894.620881] RIP: 0033:0x7f7ee9d0f917 > [ 894.620891] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f > 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f > 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 > [ 894.620920] RSP: 002b:00007ffd0b9a9e58 EFLAGS: 00000246 ORIG_RAX: > 000000000000002e > [ 894.620935] RAX: ffffffffffffffda RBX: 000000006728b03b RCX: > 00007f7ee9d0f917 > [ 894.620948] RDX: 0000000000000000 RSI: 00007ffd0b9a9ec0 RDI: > 0000000000000003 > [ 894.620959] RBP: 0000000000000000 R08: 0000000000000001 R09: > 0000000000000078 > [ 894.620971] R10: 000000000000009b R11: 0000000000000246 R12: > 0000000000000001 > [ 894.620983] R13: 00007ffd0b9a9f70 R14: 0000000000000000 R15: > 000055784e873040 > [ 894.620997] </TASK> > [ 894.621004] Modules linked in: bonding(E) tls(E) nft_fib_inet(E) > nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) > nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) > nft_chain_nat(E) nf_nat(E) nf_conntr > ack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E) > nf_tables(E) libcrc32c(E) nfnetlink(E) vfat(E) fat(E) intel_rapl_msr(E) > intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E) > intel_powerclamp(E) coretemp(E) kvm > _intel(E) ipmi_ssif(E) iTCO_wdt(E) intel_pmc_bxt(E) kvm(E) > iTCO_vendor_support(E) rapl(E) ast(E) mei_me(E) intel_cstate(E) > intel_uncore(E) drm_shmem_helper(E) pcspkr(E) i2c_i801(E) i2c_mux(E) > drm_kms_helper(E) mei(E) mxm_wmi(E > ) lpc_ich(E) i2c_smbus(E) ioatdma(E) acpi_power_meter(E) ipmi_si(E) > acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) joydev(E) acpi_pad(E) > drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) sg(E) ice(E) ahci(E) > crct10dif_pclmu > l(E) crc32_pclmul(E) crc32c_intel(E) libahci(E) polyval_clmulni(E) > polyval_generic(E) igb(E) libata(E) ghash_clmulni_intel(E) > [ 894.621056] i2c_algo_bit(E) dca(E) libie(E) wmi(E) dm_mirror(E) > dm_region_hash(E) dm_log(E) dm_mod(E) > [ 894.621193] CR2: ffffb5818c2d7f14