Hi Cosmin, On Fri, Feb 28, 2025 at 10:31:58AM +0000, Cosmin Ratiu wrote: > On Fri, 2025-02-28 at 02:20 +0000, Hangbin Liu wrote: > > On Thu, Feb 27, 2025 at 03:31:01PM +0200, Nikolay Aleksandrov wrote: > > > > > One more thing - note I'm not an xfrm expert by far but it > > > > > seems to me here you have > > > > > to also call xdo_dev_state_free() with the old active slave > > > > > dev otherwise that will > > > > > never get called with the original real_dev after the switch to > > > > > a new > > > > > active slave (or more accurately it might if the GC runs > > > > > between the switching > > > > > but it is a race), care must be taken wrt sequence of events > > > > > because the XFRM > > > > > > > > Can we just call xs->xso.real_dev->xfrmdev_ops- > > > > >xdo_dev_state_free(xs) > > > > no matter xs->xso.real_dev == real_dev or not? I'm afraid calling > > > > xdo_dev_state_free() every where may make us lot more easily. > > > > > > > > > > You'd have to check all drivers that implement the callback to > > > answer that and even then > > > I'd stick to the canonical way of how it's done in xfrm and make > > > the bond just passthrough. > > > Any other games become dangerous and new code will have to be > > > carefully reviewed every > > > time, calling another device's free_sa when it wasn't added before > > > doesn't sound good. > > > > > > > > GC may be running in parallel which probably means that in > > > > > bond_ipsec_free_sa() > > > > > you'll have to take the mutex before calling > > > > > xdo_dev_state_free() and check > > > > > if the entry is still linked in the bond's ipsec list before > > > > > calling the free_sa > > > > > callback, if it isn't then del_sa_all got to it before the GC > > > > > and there's nothing > > > > > to do if it also called the dev's free_sa callback. The check > > > > > for real_dev doesn't > > > > > seem enough to protect against this race. > > > > > > > > I agree that we need to take the mutex before calling > > > > xdo_dev_state_free() > > > > in bond_ipsec_free_sa(). Do you think if this is enough? I'm a > > > > bit lot here. > > > > > > > > Thanks > > > > Hangbin > > > > > > Well, the race is between the xfrm GC and del_sa_all, in bond's > > > free_sa if you > > > walk the list under the mutex before calling real_dev's free > > > callback and > > > don't find the current element that's being freed in free_sa then > > > it was > > > cleaned up by del_sa_all, otherwise del_sa_all is waiting to walk > > > that > > > list and clean the entries. I think it should be fine as long as > > > free_sa > > > was called once with the proper device. > > > > OK, so the free will be called either in del_sa_all() or free_sa(). > > Something like this? > > > [...] > > Unfortunately, after applying these changes and reasoning about them > for a bit, I don't think this will work. There are still races left. > For example: > 1. An xs is marked DEAD (in __xfrm_state_delete, with x->lock held) and > before .xdo_dev_state_delete() is called on it, bond_ipsec_del_sa_all > is called in parallel, doesn't call delete on xs (because it's dead), > then calls free (incorrect without delete first), then removes the list > entry. Later, xdo_dev_state_delete( == bond_ipsec_del_sa) is called, > and calls delete (incorrect, out of order with free). Finally, > bond_ipsec_free_sa is called, which fortunately doesn't do anything > silly in the new proposed form because xs is no longer in the list. > > 2. A more sinister form of the above race can happen when > bond_ipsec_del_sa_all() calls delete on real_dev, then in parallel and > immediately after __xfrm_state_delete marks xs as DEAD and calls > bond_ipsec_del_sa() which happily calls delete on real_dev again. > > In order to fix these races (and others like it), I think > bond_ipsec_del_sa_all and bond_ipsec_add_sa_all *need* to acquire x- > >lock for each xs being processed. This would prevent xfrm from > concurrently initiating add/delete operations on the managed states. > Thanks a lot for the careful checking. I will add the x->lock in del/add_sa_all. Regards Hangbin