Re: [PATCH v8 9/9] KVM: VMX: enable IPI virtualization

Chao Gao <chao.gao@xxxxxxxxx> · Tue, 19 Apr 2022 08:00:13 +0800

On Mon, Apr 18, 2022 at 03:14:51PM +0000, Sean Christopherson wrote:
>On Mon, Apr 18, 2022, Chao Gao wrote:
>> On Fri, Apr 15, 2022 at 03:25:06PM +0000, Sean Christopherson wrote:
>> >On Mon, Apr 11, 2022, Zeng Guang wrote:
>> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >> index d1a39285deab..23fbf52f7bea 100644
>> >> --- a/arch/x86/kvm/x86.c
>> >> +++ b/arch/x86/kvm/x86.c
>> >> @@ -11180,11 +11180,15 @@ static int sync_regs(struct kvm_vcpu *vcpu)
>> >>  
>> >>  int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
>> >>  {
>> >> +	int ret = 0;
>> >> +
>> >>  	if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
>> >>  		pr_warn_once("kvm: SMP vm created on host with unstable TSC; "
>> >>  			     "guest TSC will not be reliable\n");
>> >>  
>> >> -	return 0;
>> >> +	if (kvm_x86_ops.alloc_ipiv_pid_table)
>> >> +		ret = static_call(kvm_x86_alloc_ipiv_pid_table)(kvm);
>> >
>> >Add a generic kvm_x86_ops.vcpu_precreate, no reason to make this so specific.
>> >And use KVM_X86_OP_RET0 instead of KVM_X86_OP_OPTIONAL, then this can simply be
>> >
>> >	return static_call(kvm_x86_vcpu_precreate);
>> >
>> >That said, there's a flaw in my genius plan.
>> >
>> >  1. KVM_CREATE_VM
>> >  2. KVM_CAP_MAX_VCPU_ID, set max_vcpu_ids=1
>> >  3. KVM_CREATE_VCPU, create IPIv table but ultimately fails
>> >  4. KVM decrements created_vcpus back to '0'
>> >  5. KVM_CAP_MAX_VCPU_ID, set max_vcpu_ids=4096
>> >  6. KVM_CREATE_VCPU w/ ID out of range
>> >
>> >In other words, malicious userspace could trigger buffer overflow.
>> 
>> can we simply return an error (e.g., -EEXIST) on step 5 (i.e.,
>> max_vcpu_ids cannot be changed after being set once)?
>> 
>> or
>> 
>> can we detect the change of max_vcpu_ids in step 6 and re-allocate PID
>> table?
>
>Returning an error is viable, but would be a rather odd ABI.  Re-allocating isn't
>a good option because the PID table could be in active use by other vCPUs, e.g.
>KVM would need to send a request and kick all vCPUs to have all vCPUs update their
>VMCS.
>
>And with both of those alternatives, I still don't like that every feature that
>acts on max_vcpu_ids would need to handle this same edge case.
>
>An alternative to another new ioctl() would be to to make KVM_CAP_MAX_VCPU_ID
>write-once, i.e. reject attempts to change the max once set (though we could allow
>re-writing the same value).  I think I like that idea better than adding an ioctl().
>
>It can even be done without an extra flag by zero-initializing the field and instead
>waiting until vCPU pre-create to lock in the value.  That would also help detect
>bad usage of max_vcpu_ids, especially if we added a wrapper to get the value, e.g.
>the wrapper could WARN_ON(!kvm->arch.max_vcpu_ids).

Yes, it looks simpler than adding an ioctl(). We will use this approach.