On Tue, Mar 08, 2022 at 08:08:25AM +0000, Lee Jones wrote: > On Tue, 08 Mar 2022, Jason Wang wrote: > > > On Tue, Mar 8, 2022 at 3:18 AM Lee Jones <lee.jones@xxxxxxxxxx> wrote: > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call > > > to vhost_get_vq_desc(). All we have to do here is take the same lock > > > during virtqueue clean-up and we mitigate the reported issues. > > > > > > Also WARN() as a precautionary measure. The purpose of this is to > > > capture possible future race conditions which may pop up over time. > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00 > > > > > > Cc: <stable@xxxxxxxxxxxxxxx> > > > Reported-by: syzbot+adc3cb32385586bec859@xxxxxxxxxxxxxxxxxxxxxxxxx > > > Signed-off-by: Lee Jones <lee.jones@xxxxxxxxxx> > > > --- > > > drivers/vhost/vhost.c | 10 ++++++++++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > index 59edb5a1ffe28..ef7e371e3e649 100644 > > > --- a/drivers/vhost/vhost.c > > > +++ b/drivers/vhost/vhost.c > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > int i; > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > + /* No workers should run here by design. However, races have > > > + * previously occurred where drivers have been unable to flush > > > + * all work properly prior to clean-up. Without a successful > > > + * flush the guest will malfunction, but avoiding host memory > > > + * corruption in those cases does seem preferable. > > > + */ > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > + > > > > I don't get how this can help, the mutex could be grabbed in the > > middle of the above and below line. > > The worst that happens in this slim scenario is we miss a warning. > The mutexes below will still function as expected and prevent possible > memory corruption. maybe. or maybe corruption already happened and this is the fallout. > > > + mutex_lock(&dev->vqs[i]->mutex); > > > if (dev->vqs[i]->error_ctx) > > > eventfd_ctx_put(dev->vqs[i]->error_ctx); > > > if (dev->vqs[i]->kick) > > > @@ -700,6 +709,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > if (dev->vqs[i]->call_ctx.ctx) > > > eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx); > > > vhost_vq_reset(dev, dev->vqs[i]); > > > + mutex_unlock(&dev->vqs[i]->mutex); > > > } > > > > I'm not sure it's correct to assume some behaviour of a buggy device. > > For the device mutex, we use that to protect more than just err/call > > and vq. > > When I authored this, I did so as *the* fix. However, since the cause > of today's crash has now been patched, this has become a belt and > braces solution. Michael's addition of the WARN() also has the > benefit of providing us with an early warning system for future > breakages. Personally, I think it's kinda neat. > > -- > Lee Jones [李琼斯] > Principal Technical Lead - Developer Services > Linaro.org │ Open source software for Arm SoCs > Follow Linaro: Facebook | Twitter | Blog