Each vCPU of a VM allocates a XIVE VP in OPAL which is associated with 8 event queue (EQ) descriptors, one for each priority. A POWER9 socket can handle a maximum of 1M event queues. The powernv platform allocates NR_CPUS (== 2048) VPs for the hypervisor, and each XIVE KVM device allocates KVM_MAX_VCPUS (== 2048) VPs. This means that on a bi-socket system, we can create at most: (2 * 1M) / (8 * 2048) - 1 == 127 XIVE KVM devices ie, start at most 127 VMs benefiting from an in-kernel interrupt controller. Subsequent VMs need to rely on a much slower userspace emulated XIVE or XICS device in QEMU. This is problematic as one can legitimately expect to start the same number of mono-cpu VMs as the number of HW threads available on the system, eg, 144 on a bi-socket POWER9 Witherspoon. This series allows QEMU to tell KVM how many interrupt servers are needed, which is likely less than 2048 with a typical VM, eg. it is only 256 for 32 vCPUs with a guest's core stride of 8 and 1 thread per core. With this I could run ~500 SMP1 VMs on a Witherspoon system. Patches 1 to 3 are preliminary fixes (1 and 2 have already been posted but are provided for convenience). -- Greg --- Cédric Le Goater (1): KVM: PPC: Book3S HV: XIVE: initialize private pointer when VPs are allocated Greg Kurz (5): KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper KVM: PPC: Book3S HV: XIVE: Make VP block size configurable KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPs Documentation/virt/kvm/devices/xics.txt | 14 +++ Documentation/virt/kvm/devices/xive.txt | 8 ++ arch/powerpc/include/uapi/asm/kvm.h | 3 + arch/powerpc/kvm/book3s_xive.c | 145 +++++++++++++++++++++++++------ arch/powerpc/kvm/book3s_xive.h | 17 ++++ arch/powerpc/kvm/book3s_xive_native.c | 49 +++++----- 6 files changed, 179 insertions(+), 57 deletions(-)