On Tue, Feb 22, 2022 at 01:06:12AM +0530, Anirudh Rayabharam wrote:
On Mon, Feb 21, 2022 at 07:26:28PM +0100, Stefano Garzarella wrote:
On Mon, Feb 21, 2022 at 11:33:11PM +0530, Anirudh Rayabharam wrote:
> On Mon, Feb 21, 2022 at 05:44:20PM +0100, Stefano Garzarella wrote:
> > On Mon, Feb 21, 2022 at 09:44:39PM +0530, Anirudh Rayabharam wrote:
> > > On Mon, Feb 21, 2022 at 02:59:30PM +0100, Stefano Garzarella wrote:
> > > > On Mon, Feb 21, 2022 at 12:49 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
> > > > >
> > > > > vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
> > > > > ownership. It expects current->mm to be valid.
> > > > >
> > > > > vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
> > > > > the user has not done close(), so when we are in do_exit(). In this
> > > > > case current->mm is invalid and we're releasing the device, so we
> > > > > should clean it anyway.
> > > > >
> > > > > Let's check the owner only when vhost_vsock_stop() is called
> > > > > by an ioctl.
> > > > >
> > > > > Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> > > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > > Reported-by: syzbot+1e3ea63db39f2b4440e0@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>
> > > > > ---
> > > > > drivers/vhost/vsock.c | 14 ++++++++------
> > > > > 1 file changed, 8 insertions(+), 6 deletions(-)
> > > >
> > > > Reported-and-tested-by: syzbot+0abd373e2e50d704db87@xxxxxxxxxxxxxxxxxxxxxxxxx
> > >
> > > I don't think this patch fixes "INFO: task hung in vhost_work_dev_flush"
> > > even though syzbot says so. I am able to reproduce the issue locally
> > > even with this patch applied.
> >
> > Are you using the sysbot reproducer or another test?
> > In that case, can you share it?
>
> I am using the syzbot reproducer.
>
> >
> > From the stack trace it seemed to me that the worker accesses a zone that
> > has been cleaned (iotlb), so it is invalid and fails.
>
> Would the thread hang in that case? How?
Looking at this log [1] it seems that the process is blocked on the
wait_for_completion() in vhost_work_dev_flush().
Since we're not setting the backend to NULL to stop the worker, it's likely
that the worker will keep running, preventing the flush work from
completing.
The log shows that the worker thread is stuck in iotlb_access_ok(). How
will setting the backend to NULL stop it? During my debugging I found
that the worker is stuck in this while loop:
Okay, looking at your new patch, now I see. If we enter in this loop
before setting the backend to NULL and we have start = 0 and end = (u64)
-1 , we should be there forever.
I'll remove that tag in v2, but the test might fail without this patch
applied, because for now we don't stop workers correctly.
Thanks,
Stefano