Re: [PATCH v3] KVM: remove buggy vcpu id check on vcpu creation

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Thu, 21 Apr 2016 17:29:16 +0200

2016-04-21 13:29+0200, Greg Kurz:
> On Wed, 20 Apr 2016 20:29:09 +0200
> Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
>> 2016-04-20 17:44+0200, Greg Kurz:
>> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>> > introduced a check to prevent potential kernel memory corruption in case
>> > the vcpu id is too great.
>> > 
>> > Unfortunately this check assumes vcpu ids grow in sequence with a common
>> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>> > 1024, guests may be limited down to 128 vcpus on POWER8.
>> > 
>> > This means the check does not belong here and should be moved to some arch
>> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
>> > 
>> > ARM and s390 already have such a check.
>> > 
>> > I could not spot any path in the PowerPC or common KVM code where a vcpu
>> > id is used as described in the above commit: I believe PowerPC can live
>> > without this check.  
>> 
>> The only problematic path I see is kvm_get_vcpu_by_id(), which returns
>> NULL for any id above KVM_MAX_VCPUS.
> 
> Oops my bad, I started to work on a 4.4 tree and I missed this check brought
> by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).
> 
> But again, I believe the check is wrong there also: the changelog just mentions
> this is a fastpath for the usual case where "VCPU ids match the array index"...
> why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?

(The patch had to check id >= KVM_MAX_VCPUS for sanity and there could
 not be a VCPU with that index according to the spec, so it made a
 shortcut to the correct NULL result ...)

>> Second issue is that Documentation/virtual/kvm/api.txt says
>>   4.7 KVM_CREATE_VCPU
>>   [...]
>>   This API adds a vcpu to a virtual machine.  The vcpu id is a small
>>   integer in the range [0, max_vcpus).
>> 
> 
> Yeah and I find the meaning of max_vcpus is unclear.
> 
> Here it is considered as a limit for vcpu id, but if you look at the code,
> KVM_MAX_VCPUS is also used as a limit for the number of vcpus:
> 
> virt/kvm/kvm_main.c:    if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {

I agree.  Naming of KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS would make
you think that online_vcpus limit interpretation is the correct one, but
the code is conflicted.

>> so we'd remove those two lines and change the API too.  The change would
>> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
>> just because KVM is lacking an API to set DT ID?
> 
> This is related to a limitation when running in book3s_hv mode with cpus
> that support SMT (multiple HW threads): all HW threads within a core
> cannot be running in different guests at the same time. 
> 
> We solve this by using a vcpu numbering scheme as follows:
> 
> vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest
> 
> where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus
> that can be scheduled to run on the same real core.
> 
> So, in the "worst" case where we want to run a guest with 1 thread/core and the host
> has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS.

I see, thanks.  Accommodating existing users seems like an acceptable
excuse to change the API.

>> x86 (APIC ID) is affected by this and ARM (MP ID) probably too.
>> 
> 
> x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also
> patch kvm_get_vcpu_by_id() like suggested above.

x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by
reserving blocks of bits for socket/core/thread, so if core or thread
count isn't a power of two, then the set of valid APIC IDs is sparse,
but max id is still limited by 255, so the effective maximum VCPU count
is lower.

x86 doesn't support APIC ID over 255 yet, though, so this change
wouldn't change a thing in practice. :)