Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

Stefano Garzarella <sgarzare@xxxxxxxxxx> · Wed, 2 Mar 2022 16:36:43 +0100

On Wed, Mar 02, 2022 at 09:50:38AM -0500, Michael S. Tsirkin wrote:
On Wed, Mar 02, 2022 at 03:11:21PM +0100, Stefano Garzarella wrote:
On Wed, Mar 02, 2022 at 08:35:08AM -0500, Michael S. Tsirkin wrote:
> On Wed, Mar 02, 2022 at 10:34:46AM +0100, Stefano Garzarella wrote:
> > On Wed, Mar 02, 2022 at 07:54:21AM +0000, Lee Jones wrote:
> > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > during virtqueue clean-up and we mitigate the reported issues.
> > >
> > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> >
> > This issue is similar to [1] that should be already fixed upstream by [2].
> >
> > However I think this patch would have prevented some issues, because
> > vhost_vq_reset() sets vq->private to NULL, preventing the worker from
> > running.
> >
> > Anyway I think that when we enter in vhost_dev_cleanup() the worker should
> > be already stopped, so it shouldn't be necessary to take the mutex. But in
> > order to prevent future issues maybe it's better to take them, so:
> >
> > Reviewed-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>
> >
> > [1]
> > https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a822
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
>
>
> Right. I want to queue this but I would like to get a warning
> so we can detect issues like [2] before they cause more issues.

I agree, what about moving the warning that we already have higher up, right
at the beginning of the function?

I mean something like this:

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe2..1721ff3f18c0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -692,6 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 {
        int i;
+       WARN_ON(!llist_empty(&dev->work_list));
+
        for (i = 0; i < dev->nvqs; ++i) {
                if (dev->vqs[i]->error_ctx)
                        eventfd_ctx_put(dev->vqs[i]->error_ctx);
@@ -712,7 +714,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
        dev->iotlb = NULL;
        vhost_clear_msg(dev);
        wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
-       WARN_ON(!llist_empty(&dev->work_list));
        if (dev->worker) {
                kthread_stop(dev->worker);
                dev->worker = NULL;


Hmm I'm not sure why it matters.

Because after this new patch, putting locks in the while loop, when we 
finish the loop the workers should be stopped, because vhost_vq_reset() 
sets vq->private to NULL.

But the best thing IMHO is to check that there is no backend set for 
each vq, so the workers have been stopped correctly at this point.

Thanks,
Stefano