On Mon, Dec 5, 2022 at 10:47 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Side topic, the shortlog is way, way too long. The purpose of the shortlog is > to provide a synopsis of the change, not to describe the change in detail. > > I also think this patch should be 2/2, with the more generic support added along > with the module param (or capability) in 1/2. E.g. to yield something like > > KVM: x86/mmu: Add a module param to make per-vCPU SPTs NUMA aware > KVM: x86/mmu: Honor NUMA awareness for per-VM page table allocations > > On Mon, Dec 05, 2022, Ben Gardon wrote: > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 4736d7849c60..0554dfc55553 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -90,6 +90,9 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_period_ms, "uint"); > > > static bool __read_mostly force_flush_and_sync_on_reuse; > > > module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644); > > > > > > +static bool __read_mostly numa_aware_pagetable = true; > > > +module_param_named(numa_aware_pagetable, numa_aware_pagetable, bool, 0644); > > > + > > > > I'm usually all for having module params to control things, but in > > this case I don't think it provides much value because whether this > > NUMA optimization is useful or not is going to depend more on VM size > > and workload than anything else. If we wanted to make this > > configurable, a VM capability would probably be a better mechanism so > > that userspace could leave it off when running small, > > non-performance-sensitive VMs > > Would we actually want to turn it off in this case? IIUC, @nid is just the > preferred node, i.e. failure to allocate for the preferred @nid will result in > falling back to other nodes, not outright failure. So the pathological worst > case scenario would be that for a system with VMs that don't care about performance, > all of a nodes memory is allocated due to all VMs starting on that node. > > On the flip side, if a system had a mix of VM shapes, I think we'd want even the > performance insensitive VMs to be NUMA aware so that they can be sequestered on > their own node(s), i.e. don't "steal" memory from the VMs that are performance > sensitive and have been affined to a single node. Yeah, the only reason to turn it off would be to save memory. As a strawman, if you had 100 1-vCPU VMs on a 2 node system, you'd have 4000 pages allocated in caches, doing nothing. > > > and turn it on when running large, multi-node VMs. A whole-host module > > parameter seems overly restrictive.