On Tue, 13 Aug 2019 09:50:27 +0100, Zenghui Yu <yuzenghui@xxxxxxxxxx> wrote: Hi Zenghui, > > Hi folks, > > Since commit e25028c8ded0 ("KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to > 512"), we seemed to be allowed to boot a 512U guest. But I failed to > start it up with the latest QEMU. I guess there are at least *two* > reasons (limitations). > > First I got a QEMU abort: > "kvm_set_irq: Invalid argument" > > Enable the trace_kvm_irq_line() under debugfs, when it comed with > vcpu-256, I got: > "Inject UNKNOWN interrupt (3), vcpu->idx: 0, num: 23, level: 0" > and kvm_vm_ioctl_irq_line() returns -EINVAL to user-space... > > So the thing is that we only have 8 bits for vcpu_index field ([23:16]) > in KVM_IRQ_LINE ioctl. irq_type field will be corrupted if we inject a > PPI to vcpu-256, whose vcpu_index will take 9 bits. Irk. Not great indeed. Clearly, we have a couple of holes in the way we test these ABI changes (/me eyes Eric...). > > I temporarily patched the KVM and QEMU with the following diff: > > ---8<--- > diff --git a/arch/arm64/include/uapi/asm/kvm.h > b/arch/arm64/include/uapi/asm/kvm.h > index 95516a4..39a0fb1 100644 > --- a/arch/arm64/include/uapi/asm/kvm.h > +++ b/arch/arm64/include/uapi/asm/kvm.h > @@ -325,10 +325,10 @@ struct kvm_vcpu_events { > #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1 > > /* KVM_IRQ_LINE irq field index values */ > -#define KVM_ARM_IRQ_TYPE_SHIFT 24 > -#define KVM_ARM_IRQ_TYPE_MASK 0xff > +#define KVM_ARM_IRQ_TYPE_SHIFT 28 > +#define KVM_ARM_IRQ_TYPE_MASK 0xf > #define KVM_ARM_IRQ_VCPU_SHIFT 16 > -#define KVM_ARM_IRQ_VCPU_MASK 0xff > +#define KVM_ARM_IRQ_VCPU_MASK 0xfff > #define KVM_ARM_IRQ_NUM_SHIFT 0 > #define KVM_ARM_IRQ_NUM_MASK 0xffff > > ---8<--- > > It makes things a bit better, it also immediately BREAKs the api with > old versions. Yes, and we can't have that (specially if you consider that this API is shared between 32 and 64bit). One "get out of jail card" is to steal a few bits from the top of the word, and encode things there: diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h index 4602464ebdfb..86db092e4c2f 100644 --- a/arch/arm/include/uapi/asm/kvm.h +++ b/arch/arm/include/uapi/asm/kvm.h @@ -254,8 +254,10 @@ struct kvm_vcpu_events { #define KVM_DEV_ARM_ITS_CTRL_RESET 4 /* KVM_IRQ_LINE irq field index values */ +#define KVM_ARM_IRQ_VCPU2_SHIFT 28 +#define KVM_ARM_IRQ_VCPU2_MASK 0xf #define KVM_ARM_IRQ_TYPE_SHIFT 24 -#define KVM_ARM_IRQ_TYPE_MASK 0xff +#define KVM_ARM_IRQ_TYPE_MASK 0xf #define KVM_ARM_IRQ_VCPU_SHIFT 16 #define KVM_ARM_IRQ_VCPU_MASK 0xff #define KVM_ARM_IRQ_NUM_SHIFT 0 diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index 7b7ac0f6cec9..44cb25bfc95e 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -308,8 +308,10 @@ struct kvm_vcpu_events { #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1 /* KVM_IRQ_LINE irq field index values */ +#define KVM_ARM_IRQ_VCPU2_SHIFT 28 +#define KVM_ARM_IRQ_VCPU2_MASK 0xf #define KVM_ARM_IRQ_TYPE_SHIFT 24 -#define KVM_ARM_IRQ_TYPE_MASK 0xff +#define KVM_ARM_IRQ_TYPE_MASK 0xf #define KVM_ARM_IRQ_VCPU_SHIFT 16 #define KVM_ARM_IRQ_VCPU_MASK 0xff #define KVM_ARM_IRQ_NUM_SHIFT 0 diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 90cedebaeb94..fb685c1c0514 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -889,6 +889,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level, irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK; vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK; + vcpu_idx += ((irq >> KVM_ARM_IRQ_VCPU2_SHIFT) & KVM_ARM_IRQ_VCPU2_MASK) * (KVM_ARM_IRQ_VCPU_MASK + 1); irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK; trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level); It should work because we've been careful not to allow value outside of {0, 1, 2} for irq_type. I don't like it, but I really don't feel like adding another IRQ related ioctl. We still have 16 irq types (which is already a waste of space), and we can go up to 4096 vcpu. Peter, what do you think? > Next comes one more QEMU abort (with the "fix" above): > "Failed to set device address: No space left on device" > > We register two io devices (rd_dev and sgi_dev) on KVM_MMIO_BUS for > each redistributor. 512 vcpus take 1024 io devices, which is beyond the > maximum limitation of the current kernel - NR_IOBUS_DEVS (1000). > So we get a ENOSPC error here. I can reproduce that issue here ("499 vcpus on my Chromebook, baby"). Not an ABI problem though, and we can bump it up if that's needed. > I don't know if the similar problems have been discussed before in ML. > Is it time to really support the 512U guest? The real question is "why the hell would you want to do that?" ;-) Seriously, I'm very interested in finding out what is the use case for these gigantic VMs, other than debugging the kernel for big machines. Thanks, M. -- Jazz is not dead, it just smells funny. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm