On Wed, Mar 05, 2025 at 04:12:18PM +0000, Cosmin Ratiu wrote: > On Wed, 2025-03-05 at 14:13 +0000, Hangbin Liu wrote: > > On Wed, Mar 05, 2025 at 10:38:36AM +0200, Nikolay Aleksandrov wrote: > > > > @@ -617,8 +614,18 @@ static void bond_ipsec_del_sa_all(struct > > > > bonding *bond) > > > > > > > > mutex_lock(&bond->ipsec_lock); > > > > list_for_each_entry(ipsec, &bond->ipsec_list, list) { > > > > > > Second time - you should use list_for_each_entry_safe if you're > > > walking and deleting > > > elements from the list. > > > > Sorry, I missed this comment. I will update in next version. > > > > > > > > > + spin_lock_bh(&ipsec->xs->lock); > > > > if (!ipsec->xs->xso.real_dev) > > > > - continue; > > > > + goto next; > > > > + > > > > + if (ipsec->xs->km.state == XFRM_STATE_DEAD) { > > > > + /* already dead no need to delete again > > > > */ > > > > + if (real_dev->xfrmdev_ops- > > > > >xdo_dev_state_free) > > > > + real_dev->xfrmdev_ops- > > > > >xdo_dev_state_free(ipsec->xs); > > > > > > Have you checked if .xdo_dev_state_free can sleep? > > > I see at least one that can: mlx5e_xfrm_free_state(). > > > > Hmm, This brings us back to the initial problem. We tried to avoid > > calling > > a spin lock in a sleep context (bond_ipsec_del_sa), but now the new > > code > > encounters this issue again. > > The reason the mutex was added (instead of the spinlock used before) > was exactly because the add and free offload operations could sleep. > > > With your reply, I also checked the xdo_dev_state_add() in > > bond_ipsec_add_sa_all(), which may also sleep, e.g. > > mlx5e_xfrm_add_state(), > > > > If we unlock the spin lock, then the race came back again. > > > > Any idea about this? > > The race is between bond_ipsec_del_sa_all and bond_ipsec_del_sa (plus > bond_ipsec_free_sa). The issue is that when bond_ipsec_del_sa_all > releases x->lock, bond_ipsec_del_sa can immediately be called, followed > by bond_ipsec_free_sa. > Maybe dropping x->lock after setting real_dev to NULL? I checked, > real_dev is not used anywhere on the free calls, I think. I have > another series refactoring things around real_dev, I hope to be able to > send it soon. > > Here's a sketch of this idea: > > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -613,8 +613,11 @@ static void bond_ipsec_del_sa_all(struct bonding > *bond) > > mutex_lock(&bond->ipsec_lock); > list_for_each_entry(ipsec, &bond->ipsec_list, list) { > - if (!ipsec->xs->xso.real_dev) > + spin_lock(&ipsec->x->lock); > + if (!ipsec->xs->xso.real_dev) { > + spin_unlock(&ipsec->x->lock); > continue; > + } > > if (!real_dev->xfrmdev_ops || > !real_dev->xfrmdev_ops->xdo_dev_state_delete || > @@ -622,12 +625,16 @@ static void bond_ipsec_del_sa_all(struct bonding > *bond) > slave_warn(bond_dev, real_dev, > "%s: no slave > xdo_dev_state_delete\n", > __func__); > - } else { > - real_dev->xfrmdev_ops- > >xdo_dev_state_delete(real_dev, ipsec->xs); > - if (real_dev->xfrmdev_ops->xdo_dev_state_free) > - real_dev->xfrmdev_ops- > >xdo_dev_state_free(ipsec->xs); > - ipsec->xs->xso.real_dev = NULL; > + spin_unlock(&ipsec->x->lock); > + continue; > } > + > + real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, > ipsec->xs); > + ipsec->xs->xso.real_dev = NULL; Set xs->xso.real_dev = NULL is a good idea. As we will break in bond_ipsec_del_sa()/bond_ipsec_free_sa() when there is no xs->xso.real_dev. For bond_ipsec_add_sa_all(), I will move the xso.real_dev = real_dev after .xdo_dev_state_add() in case the following situation. bond_ipsec_add_sa_all() spin_unlock(&ipsec->x->lock); ipsec->xs->xso.real_dev = real_dev; __xfrm_state_delete x->state = DEAD - bond_ipsec_del_sa() - .xdo_dev_state_delete() .xdo_dev_state_add() Thanks Hangbin > + /* Unlock before freeing device state, it could sleep. > */ > + spin_unlock(&ipsec->x->lock); > + if (real_dev->xfrmdev_ops->xdo_dev_state_free) > + real_dev->xfrmdev_ops- > >xdo_dev_state_free(ipsec->xs); > > Cosmin.