On Fri, Jan 14, 2022, Zeng Guang wrote: > On 1/14/2022 6:09 AM, Sean Christopherson wrote: > > On Fri, Dec 31, 2021, Zeng Guang wrote: > > > +static int vmx_expand_pid_table(struct kvm_vmx *kvm_vmx, int entry_idx) > > > +{ > > > + u64 *last_pid_table; > > > + int last_table_size, new_order; > > > + > > > + if (entry_idx <= kvm_vmx->pid_last_index) > > > + return 0; > > > + > > > + last_pid_table = kvm_vmx->pid_table; > > > + last_table_size = table_index_to_size(kvm_vmx->pid_last_index + 1); > > > + new_order = get_order(table_index_to_size(entry_idx + 1)); > > > + > > > + if (vmx_alloc_pid_table(kvm_vmx, new_order)) > > > + return -ENOMEM; > > > + > > > + memcpy(kvm_vmx->pid_table, last_pid_table, last_table_size); > > > + kvm_make_all_cpus_request(&kvm_vmx->kvm, KVM_REQ_PID_TABLE_UPDATE); > > > + > > > + /* Now old PID table can be freed safely as no vCPU is using it. */ > > > + free_pages((unsigned long)last_pid_table, get_order(last_table_size)); > > This is terrifying. I think it's safe? But it's still terrifying. > > Free old PID table here is safe as kvm making request KVM_REQ_PI_TABLE_UPDATE > with KVM_REQUEST_WAIT flag force all vcpus trigger vm-exit to update vmcs > field to new allocated PID table. At this time, it makes sure old PID table > not referenced by any vcpu. > Do you mean it still has potential problem? No, I do think it's safe, but it is still terrifying :-) > > Rather than dynamically react as vCPUs are created, what about we make max_vcpus > > common[*], extend KVM_CAP_MAX_VCPUS to allow userspace to override max_vcpus, > > and then have the IPIv support allocate the PID table on first vCPU creation > > instead of in vmx_vm_init()? > > > > That will give userspace an opportunity to lower max_vcpus to reduce memory > > consumption without needing to dynamically muck with the table in KVM. Then > > this entire patch goes away. > IIUC, it's risky if relying on userspace . That's why we have cgroups, rlimits, etc... > In this way userspace also have chance to assign large max_vcpus but not use > them at all. This cannot approach the goal to save memory as much as possible > just similar as using KVM_MAX_VCPU_IDS to allocate PID table. Userspace can simply do KVM_CREATE_VCPU until it hits KVM_MAX_VCPU_IDS...