On Mon, Feb 21, 2022 at 07:26:28PM +0100, Stefano Garzarella wrote: > On Mon, Feb 21, 2022 at 11:33:11PM +0530, Anirudh Rayabharam wrote: > > On Mon, Feb 21, 2022 at 05:44:20PM +0100, Stefano Garzarella wrote: > > > On Mon, Feb 21, 2022 at 09:44:39PM +0530, Anirudh Rayabharam wrote: > > > > On Mon, Feb 21, 2022 at 02:59:30PM +0100, Stefano Garzarella wrote: > > > > > On Mon, Feb 21, 2022 at 12:49 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote: > > > > > > > > > > > > vhost_vsock_stop() calls vhost_dev_check_owner() to check the device > > > > > > ownership. It expects current->mm to be valid. > > > > > > > > > > > > vhost_vsock_stop() is also called by vhost_vsock_dev_release() when > > > > > > the user has not done close(), so when we are in do_exit(). In this > > > > > > case current->mm is invalid and we're releasing the device, so we > > > > > > should clean it anyway. > > > > > > > > > > > > Let's check the owner only when vhost_vsock_stop() is called > > > > > > by an ioctl. > > > > > > > > > > > > Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") > > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > > Reported-by: syzbot+1e3ea63db39f2b4440e0@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > Signed-off-by: Stefano Garzarella <sgarzare@xxxxxxxxxx> > > > > > > --- > > > > > > drivers/vhost/vsock.c | 14 ++++++++------ > > > > > > 1 file changed, 8 insertions(+), 6 deletions(-) > > > > > > > > > > Reported-and-tested-by: syzbot+0abd373e2e50d704db87@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > > > I don't think this patch fixes "INFO: task hung in vhost_work_dev_flush" > > > > even though syzbot says so. I am able to reproduce the issue locally > > > > even with this patch applied. > > > > > > Are you using the sysbot reproducer or another test? > > > In that case, can you share it? > > > > I am using the syzbot reproducer. > > > > > > > > From the stack trace it seemed to me that the worker accesses a zone that > > > has been cleaned (iotlb), so it is invalid and fails. > > > > Would the thread hang in that case? How? > > Looking at this log [1] it seems that the process is blocked on the > wait_for_completion() in vhost_work_dev_flush(). > > Since we're not setting the backend to NULL to stop the worker, it's likely > that the worker will keep running, preventing the flush work from > completing. The log shows that the worker thread is stuck in iotlb_access_ok(). How will setting the backend to NULL stop it? During my debugging I found that the worker is stuck in this while loop: 1361 while (len > s) { 1362 map = vhost_iotlb_itree_first(umem, addr, last); 1363 if (map == NULL || map->start > addr) { 1364 vhost_iotlb_miss(vq, addr, access); 1365 return false; 1366 } else if (!(map->perm & access)) { 1367 /* Report the possible access violation by 1368 * request another translation from userspace. 1369 */ 1370 return false; 1371 } 1372 1373 pr_info("iotlb_access_ok: after msize=%llu, mstart=%llu\n", 1374 map->size, map->start); 1375 size = map->size - addr + map->start; 1376 1377 if (orig_addr == addr && size >= len) 1378 vhost_vq_meta_update(vq, map, type); 1379 1380 s += size; 1381 addr += size; 1382 } > > [1] https://syzkaller.appspot.com/text?tag=CrashLog&x=153f0852700000 >