On Fri, Feb 15, 2019 at 6:13 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > On Fri, Feb 15, 2019 at 6:10 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Fri, Feb 15, 2019 at 5:45 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > > > On Fri, Feb 15, 2019 at 5:03 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > > > On Fri, Feb 15, 2019 at 4:40 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > > > > > On Thu, Oct 11, 2018 at 4:18 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > > > > On 10/10/2018 09:58, syzbot wrote: > > > > > > > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316 > > > > > > > invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:993 > > > > > > > RIP: 0010:refcount_inc_checked+0x5d/0x70 lib/refcount.c:153 > > > > > > > kvm_get_kvm arch/x86/kvm/../../../virt/kvm/kvm_main.c:766 [inline] > > > > > > > kvm_ioctl_create_device arch/x86/kvm/../../../virt/kvm/kvm_main.c:2924 > > > > > > > kvm_vm_ioctl+0xed7/0x1d40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3114 > > > > > > > vfs_ioctl fs/ioctl.c:46 [inline] > > > > > > > file_ioctl fs/ioctl.c:501 [inline] > > > > > > > do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685 > > > > > > > ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702 > > > > > > > __do_sys_ioctl fs/ioctl.c:709 [inline] > > > > > > > __se_sys_ioctl fs/ioctl.c:707 [inline] > > > > > > > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707 > > > > > > > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 > > > > > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > > > > > > > > The trace here is fairly simple, but I don't understand how this could > > > > > > happen. > > > > > > > > > > > > The kvm_get_kvm is done within kvm_ioctl_create_device, which is called > > > > > > from ioctl; the last reference cannot disappear inside a ioctl, because: > > > > > > > > > > > > 1) kvm_ioctl is called from vfs_ioctl, which does fdget and holds the fd > > > > > > reference until after kvm_vm_ioctl returns > > > > > > > > > > > > 2) the file descriptor holds one reference to the struct kvm*, and this > > > > > > reference is not released until kvm_vm_release is called by the last > > > > > > fput (which could be fdput's call to fput if the process has exited in > > > > > > the meanwhile) > > > > > > > > > > > > 3) for completeness, in case anon_inode_getfd fails, put_unused_fd will > > > > > > not invoke the file descriptor's ->release callback (in this case > > > > > > kvm_device_release). > > > > > > > > > > > > CCing some random people to get their opinion... > > > > > > > > > > > > Paolo > > > > > > > > > > > > > > > Jann, is it what you fixed in "kvm: fix kvm_ioctl_create_device() > > > > > reference counting (CVE-2019-6974)"? > > > > > If so, we need to close the syzbot bug. > > > > > > > > > > > > > > > > > # See https://goo.gl/kgGztJ for information about syzkaller reproducers. > > > > > > > #{"threaded":true,"collide":true,"repeat":true,"procs":6,"sandbox":"none","fault_call":-1,"tun":true,"tmpdir":true,"cgroups":true,"netdev":true,"resetnet":true,"segv":true} > > > > > > > r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000380)='/dev/kvm\x00', 0x0, 0x0) > > > > > > > r1 = syz_open_dev$dspn(&(0x7f0000000100)='/dev/dsp#\x00', 0x3fe, 0x400) > > > > > > > r2 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0) > > > > > > > > Here we create a VM fd... > > > > > > > > > > > perf_event_open(&(0x7f0000000040)={0x1, 0x70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x50d}, 0x0, 0xffffffffffffffff, 0xffffffffffffffff, 0x0) > > > > > > > mincore(&(0x7f0000ffc000/0x1000)=nil, 0x1000, &(0x7f00000003c0)=""/4096) > > > > > > > setrlimit(0x0, &(0x7f0000000000)) > > > > > > > readahead(r1, 0x3, 0x9a6) > > > > > > > ioctl$KVM_CREATE_DEVICE(r2, 0xc00caee0, &(0x7f00000002c0)={0x4}) > > > > > > > > ... and here we do the KVM_CREATE_DEVICE ioctl with type==KVM_DEV_TYPE_VFIO. > > > > > > > > So that far it looks exactly like CVE-2019-6974. But CVE-2019-6974 > > > > also requires that someone calls close() on the file descriptor of the > > > > newly created device very quickly, before the ioctl is able to > > > > increment the refcount further, and I don't see anything like that > > > > here. Is there a chance that syzkaller called close() on a file > > > > descriptor while the ioctl() was still running without saying so here > > > > (potentially through dup2() or something like that)? > > > > > > Yes, all fd's are closed at the end of the test: > > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L2561-L2568 > > > > Can that happen before the ioctl() has finished? > > Yes, ioctl runs in a separate thread. Alright, then yes, it looks like the same bug. Since the root cause wasn't identified from the original syzkaller report, I wonder whether it would make sense to add infrastructure that makes it easier to identify the root cause from a syzkaller report; here are some random ideas: - A comment at the end of the syzkaller reproducer that lists interesting syscalls that are performed implicitly, in particular close(3..30). Without that information, the race isn't really visible here. - A config option that allows recording (subsets of) stacks, PIDs, and cpu numbers in every function that modifies a refcount, and can dump the last N such records when a refcount detects an error. This would probably be helpful for figuring out refcounting bugs, but I don't actually know how many of the bugs syzkaller finds are refcounting-related - if it's not a lot, this might not be worth the effort.