Side topic, the shortlog is way, way too long. The purpose of the shortlog is to provide a synopsis of the change, not to describe the change in detail. I also think this patch should be 2/2, with the more generic support added along with the module param (or capability) in 1/2. E.g. to yield something like KVM: x86/mmu: Add a module param to make per-vCPU SPTs NUMA aware KVM: x86/mmu: Honor NUMA awareness for per-VM page table allocations On Mon, Dec 05, 2022, Ben Gardon wrote: > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 4736d7849c60..0554dfc55553 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -90,6 +90,9 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_period_ms, "uint"); > > static bool __read_mostly force_flush_and_sync_on_reuse; > > module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644); > > > > +static bool __read_mostly numa_aware_pagetable = true; > > +module_param_named(numa_aware_pagetable, numa_aware_pagetable, bool, 0644); > > + > > I'm usually all for having module params to control things, but in > this case I don't think it provides much value because whether this > NUMA optimization is useful or not is going to depend more on VM size > and workload than anything else. If we wanted to make this > configurable, a VM capability would probably be a better mechanism so > that userspace could leave it off when running small, > non-performance-sensitive VMs Would we actually want to turn it off in this case? IIUC, @nid is just the preferred node, i.e. failure to allocate for the preferred @nid will result in falling back to other nodes, not outright failure. So the pathological worst case scenario would be that for a system with VMs that don't care about performance, all of a nodes memory is allocated due to all VMs starting on that node. On the flip side, if a system had a mix of VM shapes, I think we'd want even the performance insensitive VMs to be NUMA aware so that they can be sequestered on their own node(s), i.e. don't "steal" memory from the VMs that are performance sensitive and have been affined to a single node. > and turn it on when running large, multi-node VMs. A whole-host module > parameter seems overly restrictive.