Re: [PATCH V2] virtio-net: fix the race between refill work and close

Jason Wang <jasowang@xxxxxxxxxx> · Thu, 30 Jun 2022 14:15:34 +0800

On Thu, Jun 30, 2022 at 2:07 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
> On Thu, Jun 30, 2022 at 10:51 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> > On Thu, 30 Jun 2022 10:08:04 +0800 Jason Wang wrote:
> > > +static void enable_refill_work(struct virtnet_info *vi)
> > > +{
> > > +     spin_lock(&vi->refill_lock);
> > > +     vi->refill_work_enabled = true;
> > > +     spin_unlock(&vi->refill_lock);
> > > +}
> > > +
> > > +static void disable_refill_work(struct virtnet_info *vi)
> > > +{
> > > +     spin_lock(&vi->refill_lock);
> > > +     vi->refill_work_enabled = false;
> > > +     spin_unlock(&vi->refill_lock);
> > > +}
> > > +
> > >  static void virtqueue_napi_schedule(struct napi_struct *napi,
> > >                                   struct virtqueue *vq)
> > >  {
> > > @@ -1527,8 +1547,12 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > >       }
> > >
> > >       if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
> > > -             if (!try_fill_recv(vi, rq, GFP_ATOMIC))
> > > -                     schedule_delayed_work(&vi->refill, 0);
> > > +             if (!try_fill_recv(vi, rq, GFP_ATOMIC)) {
> > > +                     spin_lock(&vi->refill_lock);
> > > +                     if (vi->refill_work_enabled)
> > > +                             schedule_delayed_work(&vi->refill, 0);
> > > +                     spin_unlock(&vi->refill_lock);
> >
> > Are you sure you can use the basic spin_lock() flavor in all cases?
> > Isn't the disable/enable called from a different context than this
> > thing here?
>
> This function will only be called in bh so it's safe.

Ok, so it looks like we should use the bh variant in close. Otherwise
we may have a deadlock. Will fix it.

Thanks

>
> >
> > The entire delayed work construct seems a little risky because the work
> > may go to sleep after disabling napi, causing large latency spikes.
>
> Yes, but it only happens on OOM.
>
> > I guess you must have a good reason no to simply reschedule the NAPI
> > and keep retrying with GFP_ATOMIC...
>
> Less pressure on the memory allocator on OOM probably, but it looks
> like an independent issue that might be optimized in the future.
>
> >
> > Please add the target tree name to the subject.
>
> Ok
>
> Thanks
>
> >

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization