On Wed, 2025-02-26 at 12:07 +0000, Hangbin Liu wrote: > > During bonding testing, we also found a case that would trigger > the WARN_ON(xs->xso.real_dev != real_dev). > > If we create active-backup mode bonding and create ipsec tunnel over > bonding device, then remove bonding device. There is a possibility > that > the bond call bond_ipsec_del_sa_all() to delete the ipsec state > first, > then change active slave to another interface. > > At the same time, ipsec gc was called and then bond_ipsec_free_sa(). > This will cause the xs->xso.real_dev != active_slave as the failover > triggered. The call traces looks like: > [..] > > This seems like another situation that could not simply fit > 3. "if (xs->xso.real_dev != real_dev), goto out. > I'm not sure what's the xs->km.state should be during > xfrm_state_gc_task(). > Is it also set to XFRM_STATE_DEAD, because I didn't see it. XFRM_STATE_DEAD is set in __xfrm_state_delete() (and other places for what seems like error conditions), plus there's a WARN_ON(x->km.state != XFRM_STATE_DEAD) in __xfrm_state_destroy(). This last function is the main way xfrm states are destroyed, besides xfrm_dev_state_flush and xfrm_state_find (where xfrm_state_delete + xfrm_dev_state_free are used directly). So I am pretty sure that when bond .xdo_dev_state_free() is called via either one of the above three mechanisms, the state should be XFRM_STATE_DEAD. But maybe I'm missing something. > > Especially if the bond change active slave and xfrm_state_gc_task() > run > in parallel, like > > bond_ipsec_del_sa_all() > xfrm_state_gc_task() > xfrm_dev_state_free() > bond_ipsec_free_sa() > bond_ipsec_add_sa_all() > > If the xs->km.state is not XFRM_STATE_DEAD. How to avoid the > WARN_ON(xs->xso.real_dev != real_dev) in bond_ipsec_free_sa() > and how to make bond_ipsec_add_sa_all() not added the entry again. I am proposing you change this WARN_ON to an if, avoid calling xdo_dev_state_free on real_dev in that case and just remove the entry from bond->ipsec. Cosmin.