On Sat, 2022-04-16 at 03:42 +0000, Sean Christopherson wrote: > Make a KVM_REQ_APICV_UPDATE request when creating a vCPU with an > in-kernel local APIC and APICv enabled at the module level. Consuming > kvm_apicv_activated() and stuffing vcpu->arch.apicv_active directly can > race with __kvm_set_or_clear_apicv_inhibit(), as vCPU creation happens > before the vCPU is fully onlined, i.e. it won't get the request made to > "all" vCPUs. If APICv is globally inhibited between setting apicv_active > and onlining the vCPU, the vCPU will end up running with APICv enabled > and trigger KVM's sanity check. > > Mark APICv as active during vCPU creation if APICv is enabled at the > module level, both to be optimistic about it's final state, e.g. to avoid > additional VMWRITEs on VMX, and because there are likely bugs lurking > since KVM checks apicv_active in multiple vCPU creation paths. While > keeping the current behavior of consuming kvm_apicv_activated() is > arguably safer from a regression perspective, force apicv_active so that > vCPU creation runs with deterministic state and so that if there are bugs, > they are found sooner than later, i.e. not when some crazy race condition > is hit. > > WARNING: CPU: 0 PID: 484 at arch/x86/kvm/x86.c:9877 vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877 I told you that this warning catches bugs. I am not disappointed! > Modules linked in: > CPU: 0 PID: 484 Comm: syz-executor361 Not tainted 5.16.13 #2 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1~cloud0 04/01/2014 > RIP: 0010:vcpu_enter_guest+0x2ae3/0x3ee0 arch/x86/kvm/x86.c:9877 > Call Trace: > <TASK> > vcpu_run arch/x86/kvm/x86.c:10039 [inline] > kvm_arch_vcpu_ioctl_run+0x337/0x15e0 arch/x86/kvm/x86.c:10234 > kvm_vcpu_ioctl+0x4d2/0xc80 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3727 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:874 [inline] > __se_sys_ioctl fs/ioctl.c:860 [inline] > __x64_sys_ioctl+0x16d/0x1d0 fs/ioctl.c:860 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > The bug was hit by a syzkaller spamming VM creation with 2 vCPUs and a > call to KVM_SET_GUEST_DEBUG. > > r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x0, 0x0) > r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0) > ioctl$KVM_CAP_SPLIT_IRQCHIP(r1, 0x4068aea3, &(0x7f0000000000)) (async) > r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) (async) > r3 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x400000000000002) > ioctl$KVM_SET_GUEST_DEBUG(r3, 0x4048ae9b, &(0x7f00000000c0)={0x5dda9c14aa95f5c5}) > ioctl$KVM_RUN(r2, 0xae80, 0x0) > > Reported-by: Gaoning Pan <pgn@xxxxxxxxxx> > Reported-by: Yongkang Jia <kangel@xxxxxxxxxx> > Fixes: 8df14af42f00 ("kvm: x86: Add support for dynamic APICv activation") > Cc: stable@xxxxxxxxxxxxxxx > Cc: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/x86.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 753296902535..09a270cc1c8f 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -11259,8 +11259,21 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) > r = kvm_create_lapic(vcpu, lapic_timer_advance_ns); > if (r < 0) > goto fail_mmu_destroy; > - if (kvm_apicv_activated(vcpu->kvm)) > + > + /* > + * Defer evaluating inhibits until the vCPU is first run, as > + * this vCPU will not get notified of any changes until this > + * vCPU is visible to other vCPUs (marked online and added to > + * the set of vCPUs). Opportunistically mark APICv active as > + * VMX in particularly is highly unlikely to have inhibits. > + * Ignore the current per-VM APICv state so that vCPU creation > + * is guaranteed to run with a deterministic value, the request > + * will ensure the vCPU gets the correct state before VM-Entry. > + */ > + if (enable_apicv) { > vcpu->arch.apicv_active = true; > + kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); > + } > } else > static_branch_inc(&kvm_has_noapic_vcpu); > Makes sense. Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Best regards, Maxim Levitsky