On Sun, Jul 21, 2019 at 06:02:52AM -0400, Michael S. Tsirkin wrote: > On Sat, Jul 20, 2019 at 03:08:00AM -0700, syzbot wrote: > > syzbot has bisected this bug to: > > > > commit 7f466032dc9e5a61217f22ea34b2df932786bbfc > > Author: Jason Wang <jasowang@xxxxxxxxxx> > > Date: Fri May 24 08:12:18 2019 +0000 > > > > vhost: access vq metadata through kernel virtual address > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=149a8a20600000 > > start commit: 6d21a41b Add linux-next specific files for 20190718 > > git tree: linux-next > > final crash: https://syzkaller.appspot.com/x/report.txt?x=169a8a20600000 > > console output: https://syzkaller.appspot.com/x/log.txt?x=129a8a20600000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=3430a151e1452331 > > dashboard link: https://syzkaller.appspot.com/bug?extid=e58112d71f77113ddb7b > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10139e68600000 > > > > Reported-by: syzbot+e58112d71f77113ddb7b@xxxxxxxxxxxxxxxxxxxxxxxxx > > Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual > > address") > > > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > > > OK I poked at this for a bit, I see several things that > we need to fix, though I'm not yet sure it's the reason for > the failures: This stuff looks quite similar to the hmm_mirror use model and other places in the kernel. I'm still hoping we can share this code a bit more. There is another bug, this sequence here: vhost_vring_set_num_addr() mmu_notifier_unregister() [..] mmu_notifier_register() Which I think is trying to create a lock to protect dev->vqs.. Has the problem that mmu_notifier_unregister() doesn't guarantee that invalidate_start/end are fully paired. So after any unregister the code has to clean up any resulting unbalanced invalidate_count before it can call mmu_notifier_register again. ie zero the invalidate_count. It also seems really weird that vhost_map_prefetch() can fail, ie due to __get_user_pages_fast needing to block, but that just silently (permanently?) disables the optimization?? At least the usage here would be better done with a seqcount lock and a normal blocking call to get_user_pages_fast()... Jason