Re: [PATCH 1/1] virtio_net: Add timeout handler to avoid kernel hang

Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> · Mon, 22 Jan 2024 15:25:05 +0800

On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@xxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道:
> > > > > > > > > >>>>>        while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > > > > > > >>>>> -           !virtqueue_is_broken(vi->cvq))
> > > > > > > > > >>>>> +           !virtqueue_is_broken(vi->cvq)) {
> > > > > > > > > >>>>> +        if (timeout)
> > > > > > > > > >>>>> +            timeout--;
> > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could
> > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft
> > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to
> > > > > > > > > >>>> loop for some non negligible time.
> > > > > > > > > >>>>
> > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the
> > > > > > > > > >>>> thread you quoted.
> > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem.
> > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout
> > > > > > > > > >> to get a reasonable timeout？
> > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep
> > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic().
> > > > > > > > >
> > > > > > > > > I read carefully the functions read_poll_timeout() and
> > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2
> > > > > > > > > functions.
> > > > > > > >
> > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx
> > > > > > > > mode setting to workqueue first:
> > > > > > > >
> > > > > > > > https://www.mail-archive.com/virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx/msg60298.html
> > > > > > > >
> > > > > > > > >
> > > > > > > > > As such, can we add a module parameter to customize this timeout value
> > > > > > > > > by the user?
> > > > > > > >
> > > > > > > > Who is the "user" here, or how can the "user" know the value?
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Or this timeout value is stored in device register, virtio_net driver
> > > > > > > > > will read this timeout value at initialization?
> > > > > > > >
> > > > > > > > See another thread. The design needs to be general, or you can post a RFC.
> > > > > > > >
> > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have
> > > > > > > > something similar to cvq and use timeout + reset in that case.
> > > > > > >
> > > > > > > But we may block by the reset ^_^ if the device is broken?
> > > > > >
> > > > > > I mean vq reset here.
> > > > >
> > > > > I see.
> > > > >
> > > > > I mean when the deivce is broken, the vq reset also many be blocked.
> > > > >
> > > > >         void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
> > > > >         {
> > > > >                 struct virtio_pci_modern_common_cfg __iomem *cfg;
> > > > >
> > > > >                 cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
> > > > >
> > > > >                 vp_iowrite16(index, &cfg->cfg.queue_select);
> > > > >                 vp_iowrite16(1, &cfg->queue_reset);
> > > > >
> > > > >                 while (vp_ioread16(&cfg->queue_reset))
> > > > >                         msleep(1);
> > > > >
> > > > >                 while (vp_ioread16(&cfg->cfg.queue_enable))
> > > > >                         msleep(1);
> > > > >         }
> > > > >         EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
> > > > >
> > > > > In this function, for the broken device, we can not expect something.
> > > >
> > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > It looks like we have multiple goals here
> > > > > >
> > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be
> > > > > > sufficient, it has issue but nothing new
> > > > > > 2) recover from the unresponsive device, the issue for timeout is that
> > > > > > it needs to deal with false positives
> > > > >
> > > > >
> > > > > I agree.
> > > > >
> > > > > But I want to add a new goal, cvq async. In the netdim, we will
> > > > > send many requests via the cvq, so the cvq async will be nice.
> > >
> > > Then you need an interrupt for cvq.
> > >
> > > FYI, I've posted a series that use interrupt for cvq in the past:
> > >
> > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@xxxxxxxxxx/t/
> >
> > I know this. But the interrupt maybe not a good solution without new space.
>
> What do you mean by "new space"?

Yes, I know, the cvq can work with interrupt by the virtio spec.
But as I know, many hypervisors implement the cvq without supporting interrupt.
If we let the cvq work with interrupt without negotiation then
many hypervisors will hang on the new kernel.

>
> We can introduce something like enable_cb_delayed(), then you will
> only get notified after several requests.
>
> >
> > >
> > > Haven't found time in working on this anymore, maybe we can start from
> > > this or not.
> >
> >
> > I said async, but my aim is to put many requests to the cvq before getting the
> > response.
>
> It doesn't differ from TX/RX in this case.
>
> >
> > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@xxxxxxxxxxxxxxxxx/
> >
>
> This seems like a hack, if interrupt is used, you can simply do that
> in the callback.

YES.

I also want to change the code, I just want to say the async is a goal.

For the rx mode, we have introduce a work queue, I want to move the
sending command job to the work queue. The caller just wakeup
the work queue.

If the caller want to got the result sync, then the caller can wait for it.
If not, the caller can register an function to the work queue.

And I think it will be easy to implement the timeout inside the workqueue.

Thanks.

>
> Thanks
>
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thans
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Zhu Yanjun
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >       Andrew
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>