On Thu, Feb 27, 2025 at 03:31:01PM +0200, Nikolay Aleksandrov wrote: > >> One more thing - note I'm not an xfrm expert by far but it seems to me here you have > >> to also call xdo_dev_state_free() with the old active slave dev otherwise that will > >> never get called with the original real_dev after the switch to a new > >> active slave (or more accurately it might if the GC runs between the switching > >> but it is a race), care must be taken wrt sequence of events because the XFRM > > > > Can we just call xs->xso.real_dev->xfrmdev_ops->xdo_dev_state_free(xs) > > no matter xs->xso.real_dev == real_dev or not? I'm afraid calling > > xdo_dev_state_free() every where may make us lot more easily. > > > > You'd have to check all drivers that implement the callback to answer that and even then > I'd stick to the canonical way of how it's done in xfrm and make the bond just passthrough. > Any other games become dangerous and new code will have to be carefully reviewed every > time, calling another device's free_sa when it wasn't added before doesn't sound good. > > >> GC may be running in parallel which probably means that in bond_ipsec_free_sa() > >> you'll have to take the mutex before calling xdo_dev_state_free() and check > >> if the entry is still linked in the bond's ipsec list before calling the free_sa > >> callback, if it isn't then del_sa_all got to it before the GC and there's nothing > >> to do if it also called the dev's free_sa callback. The check for real_dev doesn't > >> seem enough to protect against this race. > > > > I agree that we need to take the mutex before calling xdo_dev_state_free() > > in bond_ipsec_free_sa(). Do you think if this is enough? I'm a bit lot here. > > > > Thanks > > Hangbin > > Well, the race is between the xfrm GC and del_sa_all, in bond's free_sa if you > walk the list under the mutex before calling real_dev's free callback and > don't find the current element that's being freed in free_sa then it was > cleaned up by del_sa_all, otherwise del_sa_all is waiting to walk that > list and clean the entries. I think it should be fine as long as free_sa > was called once with the proper device. OK, so the free will be called either in del_sa_all() or free_sa(). Something like this? static void bond_ipsec_del_sa_all(struct bonding *bond) @@ -620,6 +614,16 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) if (!ipsec->xs->xso.real_dev) continue; + if (ipsec->xs->km.state == XFRM_STATE_DEAD) { + /* already dead no need to delete again */ + if (real_dev->xfrmdev_ops->xdo_dev_state_free) + real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs); + list_del(&ipsec->list); + kfree(ipsec); + continue; + } + if (!real_dev->xfrmdev_ops || !real_dev->xfrmdev_ops->xdo_dev_state_delete || netif_is_bond_master(real_dev)) { @@ -659,11 +664,22 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs) if (!xs->xso.real_dev) goto out; - WARN_ON(xs->xso.real_dev != real_dev); + mutex_lock(&bond->ipsec_lock); + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (ipsec->xs == xs) { + if (real_dev && xs->xso.real_dev == real_dev && ^^ looks we don't need this xs->xso.real_dev == real_dev checking if there is no race, do we? Or just keep the WARN_ON() in case of any race. + real_dev->xfrmdev_ops && + real_dev->xfrmdev_ops->xdo_dev_state_free) + real_dev->xfrmdev_ops->xdo_dev_state_free(xs); + list_del(&ipsec->list); + kfree(ipsec); + break; + } + } + mutex_unlock(&bond->ipsec_lock); - if (real_dev && real_dev->xfrmdev_ops && - real_dev->xfrmdev_ops->xdo_dev_state_free) - real_dev->xfrmdev_ops->xdo_dev_state_free(xs); out: netdev_put(real_dev, &tracker); } -- 2.39.5 (Apple Git-154) Thanks Hangbin