Re: [PATCH 1/1] vhost: Provide a kernel warning if mutex is held whilst clean-up in progress

Stefano Garzarella <sgarzare@xxxxxxxxxx> · Fri, 4 Mar 2022 08:50:39 +0100

On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote:
On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote:
On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
> All workers/users should be halted before any clean-up should take place.
>
> Suggested-by:  Michael S. Tsirkin <mst@xxxxxxxxxx>
> Signed-off-by: Lee Jones <lee.jones@xxxxxxxxxx>
> ---
>  drivers/vhost/vhost.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index bbaff6a5e21b8..d935d2506963f 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>  	int i;
>
>  	for (i = 0; i < dev->nvqs; ++i) {
> +		/* Ideally all workers should be stopped prior to clean-up */
> +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> +
>  		mutex_lock(&dev->vqs[i]->mutex);

I know nothing about vhost, but this construction and patch looks
strange to me.

If all workers were stopped, you won't need mutex_lock(). The mutex_lock
here suggests to me that workers can still run here.

Thanks

"Ideally" here is misleading, we need a bigger detailed comment
along the lines of:

/*
* By design, no workers can run here. But if there's a bug and the
* driver did not flush all work properly then they might, and we
* encountered such bugs in the past.  With no proper flush guest won't
* work correctly but avoiding host memory corruption in this case
* sounds like a good idea.
*/

Can we use vhost_vq_get_backend() to check this situation?

IIUC all the vhost devices clear the backend to stop the workers.
This is not racy (if we do after the mutex_lock) and should cover all 
cases.

Thanks,
Stefano