----- Original Message ----- > From: "Dmitry Vyukov" <dvyukov@xxxxxxxxxx> > To: "Steve Rutherford" <srutherford@xxxxxxxxxx> > Cc: "syzkaller" <syzkaller@xxxxxxxxxxxxxxxx>, "Paolo Bonzini" <pbonzini@xxxxxxxxxx>, "Radim Krčmář" > <rkrcmar@xxxxxxxxxx>, "KVM list" <kvm@xxxxxxxxxxxxxxx>, "LKML" <linux-kernel@xxxxxxxxxxxxxxx> > Sent: Monday, January 16, 2017 10:34:26 PM > Subject: Re: kvm: use-after-free in process_srcu > > On Sun, Jan 15, 2017 at 6:11 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > > On Fri, Jan 13, 2017 at 10:19 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > >> On Fri, Jan 13, 2017 at 4:30 AM, Steve Rutherford > >> <srutherford@xxxxxxxxxx> wrote: > >>> I'm not that familiar with the kernel's workqueues, but this seems > >>> like the classic "callback outlives the memory it references" > >>> use-after-free, where the process_srcu callback is outliving struct > >>> kvm (which contains the srcu_struct). If that's right, then calling > >>> srcu_barrier (which should wait for all of the call_srcu callbacks to > >>> complete, which are what enqueue the process_srcu callbacks) before > >>> cleanup_srcu_struct in kvm_destroy_vm probably fixes this. > >>> > >>> The corresponding patch to virt/kvm/kvm_main.c looks something like: > >>> static void kvm_destroy_vm(struct kvm *kvm) > >>> { > >>> ... > >>> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) > >>> kvm_free_memslots(kvm, kvm->memslots[i]); > >>> + srcu_barrier(&kvm->irq_srcu); > >>> cleanup_srcu_struct(&kvm->irq_srcu); > >>> + srcu_barrier(&kvm->srcu); > >>> cleanup_srcu_struct(&kvm->srcu); > >>> ... > >>> > >>> > >>> Since we don't have a repro, this obviously won't be readily testable. > >>> I find srcu subtle enough that I don't trust my reasoning fully (in > >>> particular, I don't trust that waiting for all of the call_srcu > >>> callbacks to complete also waits for all of the process_srcu > >>> callbacks). Someone else know if that's the case? > >> > >> > >> From the function description it looks like it should do the trick: > >> > >> 514 /** > >> 515 * srcu_barrier - Wait until all in-flight call_srcu() callbacks > >> complete. > >> 516 * @sp: srcu_struct on which to wait for in-flight callbacks. > >> 517 */ > >> 518 void srcu_barrier(struct srcu_struct *sp) > >> > >> I see this failure happening several times per day. I've applied your > >> patch locally and will check if I see these failures happening. > > > > > > I have not seen the crash in 3 days, when usually I see several > > crashes per night. So I think we can consider that the patch fixes the > > crash: > > > > Tested-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx> > > > Unfortunately I hit it again with the patch applied. It definitely > happens less frequently now, but still happens: Try this: diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index 9b9cdd549caa..ef5599c65299 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -283,6 +283,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp) { if (WARN_ON(srcu_readers_active(sp))) return; /* Leakage unless caller handles error. */ + flush_delayed_work(&sp->work); free_percpu(sp->per_cpu_ref); sp->per_cpu_ref = NULL; } I think it should subsume Steve's patch, but I'm not 101% sure. We will have to run this through Paul. Paolo > > BUG: KASAN: use-after-free in rcu_batch_empty kernel/rcu/srcu.c:64 > [inline] at addr ffff88006ada0218 > BUG: KASAN: use-after-free in rcu_batch_dequeue kernel/rcu/srcu.c:75 > [inline] at addr ffff88006ada0218 > BUG: KASAN: use-after-free in srcu_invoke_callbacks > kernel/rcu/srcu.c:624 [inline] at addr ffff88006ada0218 > BUG: KASAN: use-after-free in process_srcu+0x275/0x280 > kernel/rcu/srcu.c:672 at addr ffff88006ada0218 > Read of size 8 by task kworker/2:2/1073 > CPU: 2 PID: 1073 Comm: kworker/2:2 Not tainted 4.10.0-rc3+ #164 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Workqueue: events_power_efficient process_srcu > Call Trace: > __dump_stack lib/dump_stack.c:15 [inline] > dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > kasan_object_err+0x1c/0x70 mm/kasan/report.c:165 > print_address_description mm/kasan/report.c:203 [inline] > kasan_report_error mm/kasan/report.c:287 [inline] > kasan_report+0x1b6/0x460 mm/kasan/report.c:307 > __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333 > rcu_batch_empty kernel/rcu/srcu.c:64 [inline] > rcu_batch_dequeue kernel/rcu/srcu.c:75 [inline] > srcu_invoke_callbacks kernel/rcu/srcu.c:624 [inline] > process_srcu+0x275/0x280 kernel/rcu/srcu.c:672 > process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098 > worker_thread+0x223/0x1990 kernel/workqueue.c:2232 > kthread+0x326/0x3f0 kernel/kthread.c:227 > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:427 > Object at ffff88006ada00c0, in cache kmalloc-16384 size: 16384 > Allocated: > PID = 32313 > [<ffffffff812b2686>] save_stack_trace+0x16/0x20 > arch/x86/kernel/stacktrace.c:57 > [<ffffffff81a0e3d3>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502 > [<ffffffff81a0e69a>] set_track mm/kasan/kasan.c:514 [inline] > [<ffffffff81a0e69a>] kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605 > [<ffffffff81a0aeeb>] kmem_cache_alloc_trace+0x10b/0x670 mm/slab.c:3629 > [<ffffffff8107910e>] kvm_arch_alloc_vm include/linux/slab.h:490 [inline] > [<ffffffff8107910e>] kvm_create_vm > arch/x86/kvm/../../../virt/kvm/kvm_main.c:613 [inline] > [<ffffffff8107910e>] kvm_dev_ioctl_create_vm > arch/x86/kvm/../../../virt/kvm/kvm_main.c:3175 [inline] > [<ffffffff8107910e>] kvm_dev_ioctl+0x1be/0x11b0 > arch/x86/kvm/../../../virt/kvm/kvm_main.c:3219 > [<ffffffff81aa55bf>] vfs_ioctl fs/ioctl.c:43 [inline] > [<ffffffff81aa55bf>] do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683 > [<ffffffff81aa6c1f>] SYSC_ioctl fs/ioctl.c:698 [inline] > [<ffffffff81aa6c1f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689 > [<ffffffff841c9941>] entry_SYSCALL_64_fastpath+0x1f/0xc2 > Freed: > PID = 32310 > [<ffffffff812b2686>] save_stack_trace+0x16/0x20 > arch/x86/kernel/stacktrace.c:57 > [<ffffffff81a0e3d3>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502 > [<ffffffff81a0ed0f>] set_track mm/kasan/kasan.c:514 [inline] > [<ffffffff81a0ed0f>] kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578 > [<ffffffff81a0cbc3>] __cache_free mm/slab.c:3505 [inline] > [<ffffffff81a0cbc3>] kfree+0xd3/0x250 mm/slab.c:3822 > [<ffffffff81070579>] kvm_arch_free_vm include/linux/kvm_host.h:781 [inline] > [<ffffffff81070579>] kvm_destroy_vm > arch/x86/kvm/../../../virt/kvm/kvm_main.c:744 [inline] > [<ffffffff81070579>] kvm_put_kvm+0x719/0x9a0 > arch/x86/kvm/../../../virt/kvm/kvm_main.c:759 > [<ffffffff81070952>] kvm_vm_release+0x42/0x50 > arch/x86/kvm/../../../virt/kvm/kvm_main.c:770 > [<ffffffff81a5f872>] __fput+0x332/0x7f0 fs/file_table.c:208 > [<ffffffff81a5fdb5>] ____fput+0x15/0x20 fs/file_table.c:244 > [<ffffffff814ca25a>] task_work_run+0x18a/0x260 kernel/task_work.c:116 > [<ffffffff81454a57>] exit_task_work include/linux/task_work.h:21 [inline] > [<ffffffff81454a57>] do_exit+0x18e7/0x28a0 kernel/exit.c:839 > [<ffffffff8145a469>] do_group_exit+0x149/0x420 kernel/exit.c:943 > [<ffffffff81489730>] get_signal+0x7e0/0x1820 kernel/signal.c:2313 > [<ffffffff8127ca92>] do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:807 > [<ffffffff81007900>] exit_to_usermode_loop+0x200/0x2a0 > arch/x86/entry/common.c:156 > [<ffffffff81009413>] prepare_exit_to_usermode > arch/x86/entry/common.c:190 [inline] > [<ffffffff81009413>] syscall_return_slowpath+0x4d3/0x570 > arch/x86/entry/common.c:259 > [<ffffffff841c99e2>] entry_SYSCALL_64_fastpath+0xc0/0xc2 > Memory state around the buggy address: > ffff88006ada0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff88006ada0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >ffff88006ada0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff88006ada0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff88006ada0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ================================================================== > Disabling lock debugging due to kernel taint > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 2 PID: 1073 Comm: kworker/2:2 Tainted: G B 4.10.0-rc3+ #164 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Workqueue: events_power_efficient process_srcu > Call Trace: > __dump_stack lib/dump_stack.c:15 [inline] > dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > panic+0x1fb/0x412 kernel/panic.c:179 > kasan_end_report+0x5b/0x60 mm/kasan/report.c:144 > kasan_report_error mm/kasan/report.c:291 [inline] > kasan_report+0x386/0x460 mm/kasan/report.c:307 > __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333 > rcu_batch_empty kernel/rcu/srcu.c:64 [inline] > rcu_batch_dequeue kernel/rcu/srcu.c:75 [inline] > srcu_invoke_callbacks kernel/rcu/srcu.c:624 [inline] > process_srcu+0x275/0x280 kernel/rcu/srcu.c:672 > process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098 > worker_thread+0x223/0x1990 kernel/workqueue.c:2232 > kthread+0x326/0x3f0 kernel/kthread.c:227 > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:427 > Dumping ftrace buffer: > (ftrace buffer empty) > Kernel Offset: disabled > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html