On 04/09/2024 11:30, Maciej Fijalkowski wrote: > On Mon, Sep 02, 2024 at 04:09:33PM +0000, Alasdair McWilliam wrote: >> Good evening, >> >> Looks like commit a62c50545b4d is the culprit. >> >> I've produced a production-grade build of kernel 6.1.95 with commit >> a62c50545b4d backed out. Seems I can no longer trigger the fault. I can >> kill -9 the process while pushing 50Gbps / 14Mpps and the process is >> just restarted and resumes like it should. >> >> I'm going to back out the same commit from 6.1.106 for our production >> builds and verify that fixes the issue there too. >> >> Can you advise if this will be reversed in future commits, or if you >> have an alternate fix in the wings? > > We've been working recently on somewhat related issues and it looks like > not every commit from [0] has been backported. > > $ git log --oneline v6.1.103..v6.1.104 drivers/net/ethernet/intel/ice/ > 5a80b682e3e1 ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog > 8782f0fcb19d ice: replace synchronize_rcu with synchronize_net > 15115033f056 ice: don't busy wait for Rx queue disable in ice_qp_dis() > 3dbc58774e58 ice: respect netif readiness in AF_XDP ZC related ndo's > > can you apply the rest of it on top of 6.1.107 and see the result? The first one I've attempted doesn't apply cleanly to 6.1.107. Eg: d59227179949 ("ice: modify error handling when setting XSK pool in ndo_bpf"). The above looks to have been based on code from around 6.8 or 6.9 where the makeup of routines like ice_qp_ena() has changed. Looks like this happened around a292ba981324 ("ice: make ice_vsi_cfg_txq() static"). Should I try and apply a292ba981324 as well?