On Tue, Mar 11, 2025 at 09:08:49PM +0000, Cosmin Ratiu wrote: > On Fri, 2025-03-07 at 09:03 -0800, Jakub Kicinski wrote: > > On Fri, 7 Mar 2025 09:42:49 +0200 Nikolay Aleksandrov wrote: > > > TBH, keeping buggy code with a comment doesn't sound good to me. > > > I'd rather remove this > > > support than tell people "good luck, it might crash". It's better > > > to be safe until a > > > correct design is in place which takes care of these issues. > > > > That's my feeling too, FWIW. I think we knew about this issue > > for a while now, the longer we wait the more users we may disrupt > > with the revert. > > These are preexisting races between the bond link failover and the user > removing the xfrm states. Unless the user wants to intentionally > trigger these bugs, chances are nobody has ever encountered them in the > wild in normal operation. In steady state, bond link failover works, > and adding/removing states works. It's the combination of the two > control plane events that may have a chance to double free or leak > states. > > I would not pull everything out just yet. > > Today, I managed to find a solution for these races (I think), based on > a patch series I am preparing against ipsec-next with other changes > related to real_dev. > > Hangbin, do you mind if I take over fixing the locking issue as part of > my series? I plan to send it upstream the following days. No, I don't mind. Please go ahead to fixing the locking issue. And thanks a lot for your reviewing. Regards Hangbin