Re: [PATCH] kvm: x86: Increase MAX_VCPUS to 710

Juergen Gross <jgross@xxxxxxxx> · Fri, 3 Sep 2021 10:13:43 +0200

On 01.09.21 17:25, Eduardo Habkost wrote:
On Wed, Sep 01, 2021 at 04:42:08PM +0200, Vitaly Kuznetsov wrote:
Igor Mammedov <imammedo@xxxxxxxxxx> writes:

On Wed, 01 Sep 2021 12:13:55 +0200
Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:

Igor Mammedov <imammedo@xxxxxxxxxx> writes:

On Wed, 01 Sep 2021 10:02:18 +0200
Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
  
Eduardo Habkost <ehabkost@xxxxxxxxxx> writes:
   
Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.

For reference, visible effects of changing KVM_MAX_VCPUS are:
- KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
- Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
   will now be 710
- Bitmap stack variables that will grow:
   - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
     - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
     - vcpu_bitmap will be 92 bytes long
   - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
     once patch "KVM: x86: Fix stack-out-of-bounds memory access
     from ioapic_write_indirect()" is applied

Signed-off-by: Eduardo Habkost <ehabkost@xxxxxxxxxx>
---
  arch/x86/include/asm/kvm_host.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af6ce8d4c86a..f76fae42bf45 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,8 +37,8 @@
  
  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
  
-#define KVM_MAX_VCPUS 288
-#define KVM_SOFT_MAX_VCPUS 240
+#define KVM_MAX_VCPUS 710

Out of pure curiosity, where did 710 came from? Is this some particular
hardware which was used for testing (weird number btw). Should we maybe
go to e.g. 1024 for the sake of the beauty of powers of two? :-)

710 wasn't tested with real VMs yet due to userspace limitations
that still need to be addressed (specifically, due to SMBIOS 2.1
table size limits).

I would be more than happy to set it to 1024 or 2048 if the KVM
maintainers agree.  :)

For reference, RHEL-8.4 is compiled with KVM_MAX_VCPUS=2048, but
userspace enforces a 710 VCPU limit.

   
+#define KVM_SOFT_MAX_VCPUS 710

Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?

Reading

commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
Author: Sasha Levin <levinsasha928@xxxxxxxxx>
Date:   Mon Jul 18 17:17:15 2011 +0300

     KVM: x86: Raise the hard VCPU count limit

the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
vCPU numbers without claiming such configurations as supported.

I have two alternative suggestions:
1) Drop KVM_SOFT_MAX_VCPUS completely.
2) Raise it to a higher number (e.g. 2048)

I will send a RFC later proposing we make KVM_MAX_VCPUS
configurable by Kconfig, and dropping KVM_SOFT_MAX_VCPUS.

   
  #define KVM_MAX_VCPU_ID 1023

1023 may not be enough now. I rememeber there was a suggestion to make
max_vcpus configurable via module parameter and this question was
raised:

https://lore.kernel.org/lkml/878s292k75.fsf@xxxxxxxxxxxxxxxxxxxx/

TL;DR: to support EPYC-like topologies we need to keep
  KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS

1024 seems to be enough on all the CPU topologies I have seen,
but I can happily implement the 4x rule below, just to be sure.


VCPU_ID (sequential 0-n range) is not APIC ID (sparse distribution),
so topology encoded in the later should be orthogonal to VCPU_ID.

Why do we even have KVM_MAX_VCPU_ID which is != KVM[_SOFT]_MAX_VCPUS
then?
I'd say for compat reasons (8c3ba334f85 KVM: x86: Raise the hard VCPU count limit)

qemu warns users that they are out of recommended (tested) limit when
it sees requested maxcpus over soft limit.
See soft_vcpus_limit in qemu.


That's the reason why we have KVM_SOFT_MAX_VCPUS in addition to
KVM_MAX_VCPUS, not why we have KVM_MAX_VCPU_ID :-)


KVM_MAX_VCPU_ID is only checked in kvm_vm_ioctl_create_vcpu() which
passes 'id' down to kvm_vcpu_init() which, in its turn, sets
'vcpu->vcpu_id'. This is, for example, returned by kvm_x2apic_id():

static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
{
         return apic->vcpu->vcpu_id;
}

So I'm pretty certain this is actually APIC id and it has topology in
it.
Yep, I mixed it up with cpu_index on QEMU side,
for x86 it fetches actual apic id and feeds that to kvm when creating vCPU.

It looks like KVM_MAX_VCPU_ID (KVM_SOFT_MAX_VCPUS) is essentially
MAX_[SOFT_]APIC_ID which in some places is treated as max number of vCPUs,
so actual max count of vCPUs could be less than that (in case of sparse APIC
IDs /non power of 2 thread|core|whatever count/).

Yes. To avoid the confusion, I'd suggest we re-define KVM_MAX_VCPU_ID as
something like:

#define KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO 4
#define KVM_MAX_VCPU_ID (KVM_MAX_VCPUS * KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO)

and add a comment about sparse APIC IDs/topology.

I will submit a new version of this patch with a rule like the
above.

A 4x ratio is very generous, but the only impact of a large
KVM_MAX_VCPU_ID is a larger struct kvm_ioapic.  Changing
KVM_MAX_VCPU_ID from 1024 to 4096 will make struct kvm_ioapic
grow from 1628 bytes to 5084 bytes, which I assume is OK.


I'm just about to send V2 of my series [1] to support specifying the
max vcpu-id and max number of vcpus of a guest via command line
parameters.


Juergen

[1]: https://lore.kernel.org/kvm/20210701154105.23215-1-jgross@xxxxxxxx/
Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc

Description: OpenPGP public key
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature