Hi Marc, On 8/22/19 11:29 AM, Marc Zyngier wrote: > Hi Eric, > > On 22/08/2019 10:08, Auger Eric wrote: >> Hi Zenghui, >> >> On 8/13/19 10:50 AM, Zenghui Yu wrote: >>> Hi folks, >>> >>> Since commit e25028c8ded0 ("KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to >>> 512"), we seemed to be allowed to boot a 512U guest. But I failed to >>> start it up with the latest QEMU. I guess there are at least *two* >>> reasons (limitations). >>> >>> First I got a QEMU abort: >>> "kvm_set_irq: Invalid argument" >>> >>> Enable the trace_kvm_irq_line() under debugfs, when it comed with >>> vcpu-256, I got: >>> "Inject UNKNOWN interrupt (3), vcpu->idx: 0, num: 23, level: 0" >>> and kvm_vm_ioctl_irq_line() returns -EINVAL to user-space... >>> >>> So the thing is that we only have 8 bits for vcpu_index field ([23:16]) >>> in KVM_IRQ_LINE ioctl. irq_type field will be corrupted if we inject a >>> PPI to vcpu-256, whose vcpu_index will take 9 bits. >>> >>> I temporarily patched the KVM and QEMU with the following diff: >>> >>> ---8<--- >>> diff --git a/arch/arm64/include/uapi/asm/kvm.h >>> b/arch/arm64/include/uapi/asm/kvm.h >>> index 95516a4..39a0fb1 100644 >>> --- a/arch/arm64/include/uapi/asm/kvm.h >>> +++ b/arch/arm64/include/uapi/asm/kvm.h >>> @@ -325,10 +325,10 @@ struct kvm_vcpu_events { >>> #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1 >>> >>> /* KVM_IRQ_LINE irq field index values */ >>> -#define KVM_ARM_IRQ_TYPE_SHIFT 24 >>> -#define KVM_ARM_IRQ_TYPE_MASK 0xff >>> +#define KVM_ARM_IRQ_TYPE_SHIFT 28 >>> +#define KVM_ARM_IRQ_TYPE_MASK 0xf >>> #define KVM_ARM_IRQ_VCPU_SHIFT 16 >>> -#define KVM_ARM_IRQ_VCPU_MASK 0xff >>> +#define KVM_ARM_IRQ_VCPU_MASK 0xfff >>> #define KVM_ARM_IRQ_NUM_SHIFT 0 >>> #define KVM_ARM_IRQ_NUM_MASK 0xffff >>> >>> ---8<--- >>> >>> It makes things a bit better, it also immediately BREAKs the api with >>> old versions. >>> >>> >>> Next comes one more QEMU abort (with the "fix" above): >>> "Failed to set device address: No space left on device" >>> >>> We register two io devices (rd_dev and sgi_dev) on KVM_MMIO_BUS for >>> each redistributor. 512 vcpus take 1024 io devices, which is beyond the >>> maximum limitation of the current kernel - NR_IOBUS_DEVS (1000). >>> So we get a ENOSPC error here. >> >> Do you plan to send a patch for increasing the NR_IOBUS_DEVS? Otherwise >> I can do it. > > I really wonder whether that's a sensible thing to do on its own. > > Looking at the implementation of kvm_io_bus_register_dev (which copies > the whole array each time we insert a device), we have an obvious issue > with systems that create a large number of device structures, leading to > large transient memory usage and slow guest start. > > We could also try and reduce the number of devices we insert by making > the redistributor a single device (which it is in reality). It probably > means we need to make the MMIO decoding more flexible. Yes it makes sense. If no objection, I can work on this as I am the source of the mess ;-) Thanks Eric > > Thanks, > > M. >