On Wed, Jul 25, 2018 at 04:12:02PM +1000, Sam Bobroff wrote: > From: Sam Bobroff <sam.bobroff@xxxxxxxxxxx> > > It is not currently possible to create the full number of possible > VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less > threads per core than it's core stride (or "VSMT mode"). This is > because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS > even though the VCPU ID is less than KVM_MAX_VCPU_ID. > > To address this, "pack" the VCORE ID and XIVE offsets by using > knowledge of the way the VCPU IDs will be used when there are less > guest threads per core than the core stride. The primary thread of > each core will always be used first. Then, if the guest uses more than > one thread per core, these secondary threads will sequentially follow > the primary in each core. > > So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the > VCPUs are being spaced apart, so at least half of each core is empty > and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped > into the second half of each core (4..7, in an 8-thread core). > > Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of > each core is being left empty, and we can map down into the second and > third quarters of each core (2, 3 and 5, 6 in an 8-thread core). > > Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary > threads are being used and 7/8 of the core is empty, allowing use of > the 1, 5, 3 and 7 thread slots. > > (Strides less than 8 are handled similarly.) > > This allows the VCORE ID or offset to be calculated quickly from the > VCPU ID or XIVE server numbers, without access to the VCPU structure. > > Signed-off-by: Sam Bobroff <sam.bobroff@xxxxxxxxxxx> I noticed a problem: > @@ -1989,10 +1989,15 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, > unsigned int id) > { > struct kvm_vcpu *vcpu; > - int err; > + int err = -EINVAL; > int core; > struct kvmppc_vcore *vcore; > > + if (id >= (KVM_MAX_VCPUS * kvm->arch.emul_smt_mode)) { > + WARN_ONCE(true, "DNCI: VCPU ID too high\n"); > + goto out; > + } On POWER8, kvm->arch.emul_smt_mode will be either 1 or 0, so this test needs to be conditional on CPU_FTR_ARCH_300. I'll fix it. Also, kvm->arch.emul_smt_mode can change at any time until the first vcore is created, so this test should be done while holding kvm->lock. Paul.