On Wed, Apr 13, 2022, Ben Gardon wrote: > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 72183ae628f7..021452a9fa91 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -7855,6 +7855,19 @@ At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting > this capability will disable PMU virtualization for that VM. Usermode > should adjust CPUID leaf 0xA to reflect that the PMU is disabled. > > +8.36 KVM_CAP_VM_DISABLE_NX_HUGE_PAGES > +--------------------------- > + > +:Capability KVM_CAP_PMU_CAPABILITY > +:Architectures: x86 > +:Type: vm > +:Returns 0 on success, -EPERM if the userspace process does not > + have CAP_SYS_BOOT Needs to document the -EINVAL cases, especially the requirement that this be called before VMs are created. The > +This capability disables the NX huge pages mitigation for iTLB MULTIHIT. > + > +The capability has no effect if the nx_huge_pages module parameter is not set. > + > 9. Known KVM API problems > ========================= > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 2c20f715f009..b8ab4fa7d4b2 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1240,6 +1240,8 @@ struct kvm_arch { > hpa_t hv_root_tdp; > spinlock_t hv_root_tdp_lock; > #endif > + > + bool disable_nx_huge_pages; > }; > > struct kvm_vm_stat { > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 671cfeccf04e..148f630af78a 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -173,9 +173,10 @@ struct kvm_page_fault { > int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); > > extern int nx_huge_pages; > -static inline bool is_nx_huge_page_enabled(void) > +static inline bool is_nx_huge_page_enabled(struct kvm *kvm) > { > - return READ_ONCE(nx_huge_pages); > + return READ_ONCE(nx_huge_pages) && > + !kvm->arch.disable_nx_huge_pages; No need for a newline, that fits on a single line. > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 566548a3efa7..03aa1e0f60e2 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -1469,7 +1469,8 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter, > * not been linked in yet and thus is not reachable from any other CPU. > */ > for (i = 0; i < PT64_ENT_PER_PAGE; i++) > - sp->spt[i] = make_huge_page_split_spte(huge_spte, level, i); > + sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, > + level, i); Just let this poke past 80 chars. > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 665c1fa8bb57..27631c3b53c2 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4286,6 +4286,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_SYS_ATTRIBUTES: > case KVM_CAP_VAPIC: > case KVM_CAP_ENABLE_CAP: > + case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: > r = 1; > break; > case KVM_CAP_EXIT_HYPERCALL: > @@ -6079,6 +6080,28 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > } > mutex_unlock(&kvm->lock); > break; > + case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: > + r = -EINVAL; > + if (cap->args[0]) > + break; > + > + /* > + * Since the risk of disabling NX hugepages is a guest crashing > + * the system, ensure the userspace process has permission to > + * reboot the system. Since I'm nitpicking already and there's also a comment... Can you call out that, unlike the actual reboot() syscall, the process needs the capability in the init? namespace (I don't actual know the terminology) because exposing /dev/kvm into a container doesn't magically limit the iTLB multihit bug to that container. I.e. that this _must_ use capable(), not ns_capable(). Amusingly, someone could subvert the selftest's SYS_reboot heuristic by running the test in a container :-)