Re: [PATCH 1/1] vhost: Provide a kernel warning if mutex is held whilst clean-up in progress

Leon Romanovsky <leon@xxxxxxxxxx> · Fri, 4 Mar 2022 09:08:05 +0200

On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote:
> On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote:
> > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
> > > All workers/users should be halted before any clean-up should take place.
> > > 
> > > Suggested-by:  Michael S. Tsirkin <mst@xxxxxxxxxx>
> > > Signed-off-by: Lee Jones <lee.jones@xxxxxxxxxx>
> > > ---
> > >  drivers/vhost/vhost.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index bbaff6a5e21b8..d935d2506963f 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > >  	int i;
> > >  
> > >  	for (i = 0; i < dev->nvqs; ++i) {
> > > +		/* Ideally all workers should be stopped prior to clean-up */
> > > +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > > +
> > >  		mutex_lock(&dev->vqs[i]->mutex);
> > 
> > I know nothing about vhost, but this construction and patch looks
> > strange to me.
> > 
> > If all workers were stopped, you won't need mutex_lock(). The mutex_lock
> > here suggests to me that workers can still run here.
> > 
> > Thanks
> 
> 
> "Ideally" here is misleading, we need a bigger detailed comment
> along the lines of:
> 
> /* 
>  * By design, no workers can run here. But if there's a bug and the
>  * driver did not flush all work properly then they might, and we
>  * encountered such bugs in the past.  With no proper flush guest won't
>  * work correctly but avoiding host memory corruption in this case
>  * sounds like a good idea.
>  */

This description looks better, but the check is inherently racy.
Why don't you add a comment and mutex_lock()? The WARN_ON here is
more distraction than actual help.

Thanks

> 
> > >  		if (dev->vqs[i]->error_ctx)
> > >  			eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > -- 
> > > 2.35.1.574.g5d30c73bfb-goog
> > > 
>