On Mon, Jan 20, 2025 at 04:16:49PM +0000, Cosmin Ratiu wrote: > On Fri, 2025-01-17 at 08:54 +0100, Steffen Klassert wrote: > > > > > > Hi Jianbo, > > > > > > I talked with Sabrina and it looks we can't simply do this. Because > > > both > > > xfrm_add_sa_expire() and xfrm_timer_handler() calling > > > __xfrm_state_delete() under > > > spin lock. If we move the xfrm_dev_state_delete() out of > > > __xfrm_state_delete(), > > > all the places need to be handled correctly. > > > > > > At the same time xfrm_timer_handler() calling > > > xfrm_dev_state_update_stats before > > > __xfrm_state_delete(). Should we also take care of it to make sure > > > the state > > > change and delete are called at the same time? > > > > > > Hi Steffen, do you have any comments? > > > > Can't you just fix this in bonding? xfrm_timer_handler() can't sleep > > anyway, even if you remove the spinlock, it is a timer function. > > > > I am not sure this can be fixed in bonding given that the > xdo_dev_state_delete op could, in the general case, sleep while talking > to the hardware. I don't think it's reasonable to expect devices to > offload xfrm while the kernel holds a spinlock. > Bonding just exposed this assumption mismatch because of the mutex that > was added to replace a spinlock which exhibited the same problem we are > talking about here. > > Do the dev offload operations need to be synchronous? Couldn't > __xfrm_state_delete instead schedule a wq to do the dev offload? I saw > there's already an xfrm_state_gc_task that's invoked to call > xfrm_dev_state_free, perhaps that could be used to do the delete as > well? Yes, I have tried to move the bonding gc work in bond_ipsec_del_sa() to a wq in https://lore.kernel.org/netdev/Z33nEKg4PxwReUu_@fedora/. i.e. move the following part out of spin lock via wq. mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) { if (ipsec->xs == xs) { list_del(&ipsec->list); kfree(ipsec); break; } } mutex_unlock(&bond->ipsec_lock); But we can see there is an (ipsec->xs == xs). So we still need to make sure the xs is not released. Can we add a xs reference in bond_ipsec_del_sa() to achieve this? Thanks Hangbin