Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

Leon Romanovsky <leon@xxxxxxxxxx> · Wed, 9 Mar 2022 20:52:14 +0200

On Tue, Mar 08, 2022 at 09:57:57AM +0100, Greg KH wrote:
> On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote:
> > On Mon, 07 Mar 2022, Greg KH wrote:
> > 
> > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote:
> > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > to vhost_get_vq_desc().  All we have to do here is take the same lock
> > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > 
> > > > Also WARN() as a precautionary measure.  The purpose of this is to
> > > > capture possible future race conditions which may pop up over time.
> > > > 
> > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > 
> > > > Cc: <stable@xxxxxxxxxxxxxxx>
> > > > Reported-by: syzbot+adc3cb32385586bec859@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > Signed-off-by: Lee Jones <lee.jones@xxxxxxxxxx>
> > > > ---
> > > >  drivers/vhost/vhost.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > >  	int i;
> > > >  
> > > >  	for (i = 0; i < dev->nvqs; ++i) {
> > > > +		/* No workers should run here by design. However, races have
> > > > +		 * previously occurred where drivers have been unable to flush
> > > > +		 * all work properly prior to clean-up.  Without a successful
> > > > +		 * flush the guest will malfunction, but avoiding host memory
> > > > +		 * corruption in those cases does seem preferable.
> > > > +		 */
> > > > +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > > 
> > > So you are trading one syzbot triggered issue for another one in the
> > > future?  :)
> > > 
> > > If this ever can happen, handle it, but don't log it with a WARN_ON() as
> > > that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
> > > you want that to happen?
> > 
> > No, Syzbot doesn't report warnings, only BUGs and memory corruption.
> 
> Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> has resulted in a huge number of kernel fixes because of that.
> 
> > > And what happens if the mutex is locked _RIGHT_ after you checked it?
> > > You still have a race...
> > 
> > No, we miss a warning that one time.  Memory is still protected.
> 
> Then don't warn on something that doesn't matter.  This line can be
> dropped as there's nothing anyone can do about it, right?

Greg, at least two other reviewers said that this line shouldn't be at
all.

https://lore.kernel.org/all/CACGkMEsjmCNQPjxPjXL0WUfbMg8ARnumEp4yjUxqznMKR1nKSQ@xxxxxxxxxxxxxx/
https://lore.kernel.org/all/YiG61RqXFvq%2Ft0fB@unreal/
https://lore.kernel.org/all/YiETnIcfZCLb63oB@unreal/

Thanks

> 
> thanks,
> 
> greg k-h