On 06/11/2017 17:01, David Hildenbrand wrote: > On 06.11.2017 16:10, Nick Desaulniers wrote: >> Does it have to be stack allocated? > > We can't use kmalloc and friends in emulate.c. We would have to > introduce new emulator callbacks. > > a) for malloc and free. hmmm. > b) for carrying out the fxrstr/fixup. > > Paolo, what do you suggest? You can use kmalloc. Any userspace user of emulate.c would have to write a wrapper. But I'm not sure it's useful... maybe the asm_safe+memcpy could be moved to a separate noinline function, so that segmented_read_std is invoked with a leaner stack. Paolo >> >> On Nov 6, 2017 3:52 AM, "David Hildenbrand" <david@xxxxxxxxxx >> <mailto:david@xxxxxxxxxx>> wrote: >> >> On 31.10.2017 12:34, syzbot wrote: >> > Hello, >> > >> > syzkaller hit the following crash on >> > 91dfed74eabcdae9378131546c446442c29bf769 >> > >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master >> <http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master> >> > compiler: gcc (GCC) 7.1.1 20170620 >> > .config is attached >> > Raw console output is attached. >> > C reproducer is attached >> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ >> > for information about syzkaller reproducers >> > >> > >> > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109 >> > 2 locks held by syzkaller879109/2909: >> > #0: (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>] >> vcpu_load+0x1c/0x70 >> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:154 >> > #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest >> > arch/x86/kvm/x86.c:6983 [inline] >> > #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run >> > arch/x86/kvm/x86.c:7061 [inline] >> > #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] >> > kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222 >> > CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted >> 4.13.0-rc4-next-20170811 >> > #1 >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >> 01/01/2011 >> > Call Trace: >> > __dump_stack lib/dump_stack.c:16 [inline] >> > dump_stack+0x194/0x257 lib/dump_stack.c:52 >> > ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014 >> > __might_sleep+0x95/0x190 kernel/sched/core.c:5967 >> > __might_fault+0xab/0x1d0 mm/memory.c:4383 >> > __copy_from_user include/linux/uaccess.h:71 [inline] >> > __kvm_read_guest_page+0x58/0xa0 >> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771 >> > kvm_vcpu_read_guest_page+0x44/0x60 >> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791 >> > kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407 >> > kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466 >> > segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819 >> > em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022 >> >> >> In em_fxrstor, we do a get_fpu(), which in return disables preemption. >> >> With preempt_disable(), we do a >> segmented_read_std()->kvm_vcpu_read_guest_page(), triggering the >> warning. >> >> > x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471 >> > x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698 >> > kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854 >> > handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400 >> > vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718 >> > vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline] >> > vcpu_run arch/x86/kvm/x86.c:7061 [inline] >> > kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222 >> > kvm_vcpu_ioctl+0x64c/0x1010 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591 >> > vfs_ioctl fs/ioctl.c:45 [inline] >> > do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 >> > SYSC_ioctl fs/ioctl.c:700 [inline] >> > SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 >> > entry_SYSCALL_64_fastpath+0x1f/0xbe >> >> I don't really see a way to avoid two fxstate variables. Unloading the >> fpu in between the fxstore/fxrstr could lead to host values getting >> overwritten. Loading/saving the fpu in kvm_arch_vcpu_ioctl_run() would >> most probably also not work, as the relevant portions of fxregs_state >> would not get saved/restored. So the preemption would still be needed. >> >> >> So all I can offer for now is the following (untested, can send as >> proper patch if needed): >> >> >> From f32d06c8c621c6d68e073e9bdb81a6280b6c9544 Mon Sep 17 00:00:00 2001 >> From: David Hildenbrand <david@xxxxxxxxxx <mailto:david@xxxxxxxxxx>> >> Date: Mon, 6 Nov 2017 12:35:39 +0100 >> Subject: [PATCH v1] KVM: x86: fix em_fxstor sleeping while in atomic >> >> Commit 9d643f63128b tried to optimize the stack size, but introduced a >> guest memory access which might sleep while in atomic. >> >> Let's undo that part of the commit but keep the cleanups. >> >> Reported by syzbot: >> >> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109 >> 2 locks held by syzkaller879109/2909: >> #0: (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>] >> vcpu_load+0x1c/0x70 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:154 >> #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest >> arch/x86/kvm/x86.c:6983 [inline] >> #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run >> arch/x86/kvm/x86.c:7061 [inline] >> #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] >> kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222 >> CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted >> 4.13.0-rc4-next-20170811 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >> 01/01/2011 >> Call Trace: >> __dump_stack lib/dump_stack.c:16 [inline] >> dump_stack+0x194/0x257 lib/dump_stack.c:52 >> ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014 >> __might_sleep+0x95/0x190 kernel/sched/core.c:5967 >> __might_fault+0xab/0x1d0 mm/memory.c:4383 >> __copy_from_user include/linux/uaccess.h:71 [inline] >> __kvm_read_guest_page+0x58/0xa0 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771 >> kvm_vcpu_read_guest_page+0x44/0x60 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791 >> kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407 >> kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466 >> segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819 >> em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022 >> x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471 >> x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698 >> kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854 >> handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400 >> vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718 >> vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline] >> vcpu_run arch/x86/kvm/x86.c:7061 [inline] >> kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222 >> kvm_vcpu_ioctl+0x64c/0x1010 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591 >> vfs_ioctl fs/ioctl.c:45 [inline] >> do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 >> SYSC_ioctl fs/ioctl.c:700 [inline] >> SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 >> entry_SYSCALL_64_fastpath+0x1f/0xbe >> RIP: 0033:0x437fc9 >> RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 >> RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9 >> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 >> RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000 >> R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000 >> R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000 >> >> Fixes: 9d643f63128b ("KVM: x86: avoid large stack allocations in >> em_fxrstor") >> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx >> <mailto:david@xxxxxxxxxx>> >> --- >> arch/x86/kvm/emulate.c | 16 +++++++++------- >> 1 file changed, 9 insertions(+), 7 deletions(-) >> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c >> index fb0055953fbc..d87f01a2d6f4 100644 >> --- a/arch/x86/kvm/emulate.c >> +++ b/arch/x86/kvm/emulate.c >> @@ -4002,7 +4002,7 @@ static int em_fxsave(struct x86_emulate_ctxt >> *ctxt) >> >> static int em_fxrstor(struct x86_emulate_ctxt *ctxt) >> { >> - struct fxregs_state fx_state; >> + struct fxregs_state fx_state, fx_old; >> int rc; >> size_t size; >> >> @@ -4010,19 +4010,21 @@ static int em_fxrstor(struct >> x86_emulate_ctxt *ctxt) >> if (rc != X86EMUL_CONTINUE) >> return rc; >> >> + size = fxstate_size(ctxt); >> + rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, >> &fx_state, size); >> + if (rc != X86EMUL_CONTINUE) >> + return rc; >> + >> ctxt->ops->get_fpu(ctxt); >> >> - size = fxstate_size(ctxt); >> if (size < __fxstate_size(16)) { >> - rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state)); >> + rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_old)); >> if (rc != X86EMUL_CONTINUE) >> goto out; >> + memcpy(((void *)&fx_state) + size, ((void *)&fx_old) >> + size, >> + __fxstate_size(16) - size); >> } >> >> - rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, >> &fx_state, size); >> - if (rc != X86EMUL_CONTINUE) >> - goto out; >> - >> if (fx_state.mxcsr >> 16) { >> rc = emulate_gp(ctxt, 0); >> goto out; >> -- >> 2.13.6 >> >> >> >> -- >> >> Thanks, >> >> David / dhildenb >> > >