On Mon, Mar 03, 2025 at 11:46:10AM -0500, Joe Damato wrote: > On Fri, Feb 28, 2025 at 06:27:59PM -0800, Jakub Kicinski wrote: > > On Thu, 27 Feb 2025 18:50:13 +0000 Joe Damato wrote: > > > @@ -2870,9 +2883,15 @@ static void refill_work(struct work_struct *work) > > > for (i = 0; i < vi->curr_queue_pairs; i++) { > > > struct receive_queue *rq = &vi->rq[i]; > > > > > > + rtnl_lock(); > > > virtnet_napi_disable(rq); > > > + rtnl_unlock(); > > > + > > > still_empty = !try_fill_recv(vi, rq, GFP_KERNEL); > > > + > > > + rtnl_lock(); > > > virtnet_napi_enable(rq); > > > + rtnl_unlock(); > > > > Looks to me like refill_work is cancelled _sync while holding rtnl_lock > > from the close path. I think this could deadlock? > > Good catch, thank you! > > It looks like this is also the case in the failure path on > virtnet_open. > > Jason: do you have any suggestions? > > It looks like in both open and close disable_delayed_refill is > called first, before the cancel_delayed_work_sync. > > Would something like this solve the problem? > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 76dcd65ec0f2..457115300f05 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2880,6 +2880,13 @@ static void refill_work(struct work_struct *work) > bool still_empty; > int i; > > + spin_lock(&vi->refill_lock); > + if (!vi->refill_enabled) { > + spin_unlock(&vi->refill_lock); > + return; > + } > + spin_unlock(&vi->refill_lock); > + > for (i = 0; i < vi->curr_queue_pairs; i++) { > struct receive_queue *rq = &vi->rq[i]; > Err, I suppose this also doesn't work because: CPU0 CPU1 rtnl_lock (before CPU0 calls disable_delayed_refill) virtnet_close refill_work rtnl_lock() cancel_sync <= deadlock Need to give this a bit more thought.