On Mon, Apr 18, 2022 at 03:14:51PM +0000, Sean Christopherson wrote: >On Mon, Apr 18, 2022, Chao Gao wrote: >> On Fri, Apr 15, 2022 at 03:25:06PM +0000, Sean Christopherson wrote: >> >On Mon, Apr 11, 2022, Zeng Guang wrote: >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> >> index d1a39285deab..23fbf52f7bea 100644 >> >> --- a/arch/x86/kvm/x86.c >> >> +++ b/arch/x86/kvm/x86.c >> >> @@ -11180,11 +11180,15 @@ static int sync_regs(struct kvm_vcpu *vcpu) >> >> >> >> int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) >> >> { >> >> + int ret = 0; >> >> + >> >> if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0) >> >> pr_warn_once("kvm: SMP vm created on host with unstable TSC; " >> >> "guest TSC will not be reliable\n"); >> >> >> >> - return 0; >> >> + if (kvm_x86_ops.alloc_ipiv_pid_table) >> >> + ret = static_call(kvm_x86_alloc_ipiv_pid_table)(kvm); >> > >> >Add a generic kvm_x86_ops.vcpu_precreate, no reason to make this so specific. >> >And use KVM_X86_OP_RET0 instead of KVM_X86_OP_OPTIONAL, then this can simply be >> > >> > return static_call(kvm_x86_vcpu_precreate); >> > >> >That said, there's a flaw in my genius plan. >> > >> > 1. KVM_CREATE_VM >> > 2. KVM_CAP_MAX_VCPU_ID, set max_vcpu_ids=1 >> > 3. KVM_CREATE_VCPU, create IPIv table but ultimately fails >> > 4. KVM decrements created_vcpus back to '0' >> > 5. KVM_CAP_MAX_VCPU_ID, set max_vcpu_ids=4096 >> > 6. KVM_CREATE_VCPU w/ ID out of range >> > >> >In other words, malicious userspace could trigger buffer overflow. >> >> can we simply return an error (e.g., -EEXIST) on step 5 (i.e., >> max_vcpu_ids cannot be changed after being set once)? >> >> or >> >> can we detect the change of max_vcpu_ids in step 6 and re-allocate PID >> table? > >Returning an error is viable, but would be a rather odd ABI. Re-allocating isn't >a good option because the PID table could be in active use by other vCPUs, e.g. >KVM would need to send a request and kick all vCPUs to have all vCPUs update their >VMCS. > >And with both of those alternatives, I still don't like that every feature that >acts on max_vcpu_ids would need to handle this same edge case. > >An alternative to another new ioctl() would be to to make KVM_CAP_MAX_VCPU_ID >write-once, i.e. reject attempts to change the max once set (though we could allow >re-writing the same value). I think I like that idea better than adding an ioctl(). > >It can even be done without an extra flag by zero-initializing the field and instead >waiting until vCPU pre-create to lock in the value. That would also help detect >bad usage of max_vcpu_ids, especially if we added a wrapper to get the value, e.g. >the wrapper could WARN_ON(!kvm->arch.max_vcpu_ids). Yes, it looks simpler than adding an ioctl(). We will use this approach.