Re: [PATCH] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 12 Jul 2023 16:47:24 -0700

On Wed, Jul 12, 2023, Like Xu wrote:
> On 2023/6/15 03:07, Sean Christopherson wrote:
> > On Wed, Jun 14, 2023, Luiz Capitulino wrote:
> > > > Applied to kvm-x86 mmu.  I kept the default as "auto" for now, as that can go on
> > > > top and I don't want to introduce that change this late in the cycle.  If no one
> > > > beats me to the punch (hint, hint ;-) ), I'll post a patch to make "never" the
> > > > default for unaffected hosts so that we can discuss/consider that change for 6.6.
> > > 
> > > Thanks Sean, I agree with the plan. I could give a try on the patch if you'd like.
> > 
> > Yes please, thanks!
> 
> As a KVM/x86 *feature*, playing with splitting and reconstructing large
> pages have other potential user scenarios, e.g. for performance test
> comparisons in a easier approach, not just for itlb_multihit mitigation.

Enabling and disabling dirty logging is a far better tool for that, as it gives
userspace much more explicit control over what pages are are split/reconstituted,
and when.

> On unaffected machines (ICX and later), nx_huge_pages is already "N",
> and turning it into "never" doesn't help materially in the mitigation
> implementation, but loses flexibility.

I'm becoming more and more convinced that losing the flexibility is perfectly
acceptable.  There's a very good argument to be made that mitigating DoS attacks
from the guest kernel should be done several levels up, e.g. by refusing to create
VMs for a customer that is bringing down hosts.  As Jim has a pointed out, plugging
the hole only works if you are 100% confident there are no other holes, and will
never be other holes.

> IMO, the real issue here is that the kernel thread "kvm-nx-lpage-
> recovery" is created unconditionally. We also need to be aware of the
> existence of this commit 084cc29f8bbb ("KVM: x86/MMU: Allow NX huge
> pages to be disabled on a per-vm basis").
> 
> One of the technical proposals is to defer kvm_vm_create_worker_thread()
> to kvm_mmu_create() or kvm_init_mmu(), based on
> kvm->arch.disable_nx_huge_pages, even until guest paging mode is enabled
> on the first vcpu.
> 
> Is this step worth taking ?

IMO, no.  In hindsight, adding KVM_CAP_VM_DISABLE_NX_HUGE_PAGES was likely a
mistake; requiring CAP_SYS_BOOT makes it annoyingly difficult to safely use the
capability.  My preference at this point is to make changes to the NX hugepage
mitigation only when there is a substantial benefit to an already-deployed usecase.