On Thu, Jan 23, 2025, Keith Busch wrote: > From: Keith Busch <kbusch@xxxxxxxxxx> > > Some libraries want to ensure they are single threaded before forking, > so making the kernel's kvm huge page recovery process a vhost task of > the user process breaks those. The minijail library used by crosvm is > one such affected application. > > Defer the task to after the first VM_RUN call, which occurs after the > parent process has forked all its jailed processes. This needs to happen > only once for the kvm instance, so this patch introduces infrastructure > to do that (Suggested-by Paolo). > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Tested-by: Alyssa Ross <hi@xxxxxxxxx> > Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx> > --- > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 26b4ba7e7cb5e..a45ae60e84ab4 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -7447,20 +7447,28 @@ static bool kvm_nx_huge_page_recovery_worker(void *data) > return true; > } > > -int kvm_mmu_post_init_vm(struct kvm *kvm) > +static void kvm_mmu_start_lpage_recovery(struct once *once) > { > - if (nx_hugepage_mitigation_hard_disabled) > - return 0; > + struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once); > + struct kvm *kvm = container_of(ka, struct kvm, arch); > > kvm->arch.nx_huge_page_last = get_jiffies_64(); > kvm->arch.nx_huge_page_recovery_thread = vhost_task_create( > kvm_nx_huge_page_recovery_worker, kvm_nx_huge_page_recovery_worker_kill, > kvm, "kvm-nx-lpage-recovery"); > > + if (kvm->arch.nx_huge_page_recovery_thread) > + vhost_task_start(kvm->arch.nx_huge_page_recovery_thread); > +} > + > +int kvm_mmu_post_init_vm(struct kvm *kvm) > +{ > + if (nx_hugepage_mitigation_hard_disabled) > + return 0; > + > + call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery); > if (!kvm->arch.nx_huge_page_recovery_thread) > return -ENOMEM; > - > - vhost_task_start(kvm->arch.nx_huge_page_recovery_thread); > return 0; > } > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 6e248152fa134..6d4a6734b2d69 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -11471,6 +11471,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > struct kvm_run *kvm_run = vcpu->run; > int r; > > + r = kvm_mmu_post_init_vm(vcpu->kvm); > + if (r) > + return r; This is broken. If the module param is toggled before the first KVM_RUN, KVM will hit a NULL pointer deref due to trying to start a non-existent vhost task: BUG: kernel NULL pointer dereference, address: 0000000000000040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 16 UID: 0 PID: 1190 Comm: bash Not tainted 6.13.0-rc3-9bb02e874121-x86/xen_msr_fixes-vm #2382 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:vhost_task_wake+0x5/0x10 Call Trace: <TASK> set_nx_huge_pages+0xcc/0x1e0 [kvm] param_attr_store+0x8a/0xd0 module_attr_store+0x1a/0x30 kernfs_fop_write_iter+0x12f/0x1e0 vfs_write+0x233/0x3e0 ksys_write+0x60/0xd0 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f3b52710104 </TASK> Modules linked in: kvm_intel kvm CR2: 0000000000000040 ---[ end trace 0000000000000000 ]---