On Thu, Dec 19, 2024 at 09:30:16PM +0100, Paolo Bonzini wrote: > On Thu, Dec 19, 2024 at 7:09 PM Keith Busch <kbusch@xxxxxxxxxx> wrote: > > > Is crosvm trying to do anything but exec? If not, it should probably use the > > > flag. > > > > Good point, and I'm not sure right now. I don't think I know any crosvm > > developer experts but I'm working on that to get a better explanation of > > what's happening, > > Ok, I found the code and it doesn't exec (e.g. > https://github.com/google/crosvm/blob/b339d3d7/src/crosvm/sys/linux/jail_warden.rs#L122), > so that's not an option. Well, if I understand correctly from a > cursory look at the code, crosvm is creating a jailed child process > early, and then spawns further jails through it; so it's just this > first process that has to cheat. > > One possibility on the KVM side is to delay creating the vhost_task > until the first KVM_RUN. I don't like it but... This option is actually kind of appealing in that we don't need to change any application side to filter out kernel tasks, as well as not having a new kernel dependency to even report these types of tasks as kernel threads. I gave it a quick try. I'm not very familiar with the code here, so not sure if this is thread safe or not, but it did successfully get crosvm booting again. --- diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2401606db2604..422b6b06de4fe 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7415,6 +7415,8 @@ int kvm_mmu_post_init_vm(struct kvm *kvm) { if (nx_hugepage_mitigation_hard_disabled) return 0; + if (kvm->arch.nx_huge_page_recovery_thread) + return 0; kvm->arch.nx_huge_page_last = get_jiffies_64(); kvm->arch.nx_huge_page_recovery_thread = vhost_task_create( diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c79a8cc57ba42..263363c46626b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11463,6 +11463,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) struct kvm_run *kvm_run = vcpu->run; int r; + r = kvm_mmu_post_init_vm(vcpu->kvm); + if (r) + return r; + vcpu_load(vcpu); kvm_sigset_activate(vcpu); kvm_run->flags = 0; @@ -12740,11 +12744,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) return ret; } -int kvm_arch_post_init_vm(struct kvm *kvm) -{ - return kvm_mmu_post_init_vm(kvm); -} - static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu) { vcpu_load(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 401439bb21e3e..a219bd2d8aec8 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1596,7 +1596,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu); bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu); bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu); -int kvm_arch_post_init_vm(struct kvm *kvm); void kvm_arch_pre_destroy_vm(struct kvm *kvm); void kvm_arch_create_vm_debugfs(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index de2c11dae2316..adacc6eaa7d9d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1065,15 +1065,6 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname) return ret; } -/* - * Called after the VM is otherwise initialized, but just before adding it to - * the vm_list. - */ -int __weak kvm_arch_post_init_vm(struct kvm *kvm) -{ - return 0; -} - /* * Called just after removing the VM from the vm_list, but before doing any * other destruction. @@ -1194,10 +1185,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) if (r) goto out_err_no_debugfs; - r = kvm_arch_post_init_vm(kvm); - if (r) - goto out_err; - mutex_lock(&kvm_lock); list_add(&kvm->vm_list, &vm_list); mutex_unlock(&kvm_lock); @@ -1207,8 +1194,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) return kvm; -out_err: - kvm_destroy_vm_debugfs(kvm); out_err_no_debugfs: kvm_coalesced_mmio_free(kvm); out_no_coalesced_mmio: --