On Mon, Jan 06, 2025 at 10:47:16AM +0000, Hangbin Liu wrote: > On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote: > > > > Re-locking doesn't look great, glancing at the code I don't see any > > > > obvious better workarounds. Easiest fix would be to don't let the > > > > drivers sleep in the callbacks and then we can go back to a spin lock. > > > > Maybe nvidia people have better ideas, I'm not familiar with this > > > > offload. > > > > > > I don't know how to disable bonding sleeping since we use mutex_lock now. > > > Hi Jianbo, do you have any idea? > > > > > > > I think we should allow drivers to sleep in the callbacks. So, maybe it's > > better to move driver's xdo_dev_state_delete out of state's spin lock. > > I just check the code, xfrm_dev_state_delete() and later > dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x > checks. Can we really move it out of spin lock from xfrm_state_delete() I tried to move the mutex lock code to a work queue, but found we need to check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond ipsec gc. So either we add a new lock for xfrm_state, or we need to unlock spin lock in bonding bond_ipsec_del_sa(). Cc IPsec experts to see if they have any comments. Background: The xfrm_dev_state_delete() in xfrm_state_delete() is protected by spin lock. But the driver delete ops dev->xfrmdev_ops->xdo_dev_state_delete(x) may sleep, e.g. bond_ipsec_del_sa(). What we should deal with this issue? Thanks Hangbin