Juergen Gross <jgross@xxxxxxxx> writes: > On 14.07.21 13:15, Vitaly Kuznetsov wrote: >> Juergen Gross <jgross@xxxxxxxx> writes: >> >>> Today the maximum number of vcpus of a kvm guest is set via a #define >>> in a header file. >>> >>> In order to support higher vcpu numbers for guests without generally >>> increasing the memory consumption of guests on the host especially on >>> very large systems add a boot parameter for specifying the number of >>> allowed vcpus for guests. >>> >>> The default will still be the current setting of 288. The value 0 has >>> the special meaning to limit the number of possible vcpus to the >>> number of possible cpus of the host. >>> >>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx> >>> --- >>> Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++ >>> arch/x86/include/asm/kvm_host.h | 5 ++++- >>> arch/x86/kvm/x86.c | 7 +++++++ >>> 3 files changed, 21 insertions(+), 1 deletion(-) >>> >>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>> index 99bfa53a2bbd..8eb856396ffa 100644 >>> --- a/Documentation/admin-guide/kernel-parameters.txt >>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>> @@ -2373,6 +2373,16 @@ >>> guest can't have more vcpus than the set value + 1. >>> Default: 1023 >>> >>> + kvm.max_vcpus= [KVM,X86] Set the maximum allowed numbers of vcpus per >>> + guest. The special value 0 sets the limit to the number >>> + of physical cpus possible on the host (including not >>> + yet hotplugged cpus). Higher values will result in >>> + slightly higher memory consumption per guest. Depending >>> + on the value and the virtual topology the maximum >>> + allowed vcpu-id might need to be raised, too (see >>> + kvm.max_vcpu_id parameter). >> >> I'd suggest to at least add a sanity check: 'max_vcpu_id' should always >> be >= 'max_vcpus'. Alternatively, we can replace 'max_vcpu_id' with say >> 'vcpu_id_to_vcpus_ratio' and set it to e.g. '4' by default. > > Either would be fine with me. > > A default of '2' for the ratio would seem more appropriate for me, > however. A thread count per core not being a power of 2 is quite > unlikely, and the worst case scenario for cores per socket would be > 2^n + 1. > (I vaguely recall AMD EPYC had more than thread id (package id?) encapsulated into APIC id). Personally, I'd vote for introducing a 'ratio' parameter then so generally users will only have to set 'kvm.max_vcpus'. >> >>> + Default: 288 >>> + >>> l1tf= [X86] Control mitigation of the L1TF vulnerability on >>> affected CPUs >>> >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >>> index 39cbc4b6bffb..65ae82a5d444 100644 >>> --- a/arch/x86/include/asm/kvm_host.h >>> +++ b/arch/x86/include/asm/kvm_host.h >>> @@ -37,7 +37,8 @@ >>> >>> #define __KVM_HAVE_ARCH_VCPU_DEBUGFS >>> >>> -#define KVM_MAX_VCPUS 288 >>> +#define KVM_DEFAULT_MAX_VCPUS 288 >>> +#define KVM_MAX_VCPUS max_vcpus >>> #define KVM_SOFT_MAX_VCPUS 240 >>> #define KVM_DEFAULT_MAX_VCPU_ID 1023 >>> #define KVM_MAX_VCPU_ID max_vcpu_id >>> @@ -1509,6 +1510,8 @@ extern u64 kvm_max_tsc_scaling_ratio; >>> extern u64 kvm_default_tsc_scaling_ratio; >>> /* bus lock detection supported? */ >>> extern bool kvm_has_bus_lock_exit; >>> +/* maximum number of vcpus per guest */ >>> +extern unsigned int max_vcpus; >>> /* maximum vcpu-id */ >>> extern unsigned int max_vcpu_id; >>> /* per cpu vcpu bitmasks (disable preemption during usage) */ >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index a9b0bb2221ea..888c4507504d 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -177,6 +177,10 @@ module_param(force_emulation_prefix, bool, S_IRUGO); >>> int __read_mostly pi_inject_timer = -1; >>> module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR); >>> >>> +unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS; >>> +module_param(max_vcpus, uint, S_IRUGO); >>> +EXPORT_SYMBOL_GPL(max_vcpus); >>> + >>> unsigned int __read_mostly max_vcpu_id = KVM_DEFAULT_MAX_VCPU_ID; >>> module_param(max_vcpu_id, uint, S_IRUGO); >>> >>> @@ -10648,6 +10652,9 @@ int kvm_arch_hardware_setup(void *opaque) >>> if (boot_cpu_has(X86_FEATURE_XSAVES)) >>> rdmsrl(MSR_IA32_XSS, host_xss); >>> >>> + if (max_vcpus == 0) >>> + max_vcpus = num_possible_cpus(); >> >> Is this special case really needed? I mean 'max_vcpus' is not '0' by >> default so whoever sets it manually probably knows how big his guests >> are going to be anyway and it is not always obvious how many CPUs are >> reported by 'num_possible_cpus()' (ACPI tables can be weird for example). > > The idea was to make it easy for anyone managing a large fleet of hosts > and wanting to have a common setting for all of them. > I see. It seems to be uncommon indeed to run guests with more vCPUs than host pCPUs so everything >= num_online_cpus() should be OK. My only concern about num_possible_cpus() is that it is going to be hard to explain what 'possible CPUs' mean (but whoever cares that much about wasting memory can always set the required value manually). > It would even be possible to use '0' as the default (probably via config > option only). > >> >>> + >>> kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ, >>> sizeof(unsigned long)); >>> kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64)); >> -- Vitaly