On Thu, Mar 28, 2024 at 12:33:35PM +1300, "Huang, Kai" <kai.huang@xxxxxxxxx> wrote: > > + kvm_tdx->tdr_pa = tdr_pa; > > + > > + for_each_online_cpu(i) { > > + int pkg = topology_physical_package_id(i); > > + > > + if (cpumask_test_and_set_cpu(pkg, packages)) > > + continue; > > + > > + /* > > + * Program the memory controller in the package with an > > + * encryption key associated to a TDX private host key id > > + * assigned to this TDR. Concurrent operations on same memory > > + * controller results in TDX_OPERAND_BUSY. Avoid this race by > > + * mutex. > > + */ > > IIUC the race can only happen when you are creating multiple TDX guests > simulatenously? Please clarify this in the comment. > > And I even don't think you need all these TDX module details: > > /* > * Concurrent run of TDH.MNG.KEY.CONFIG on the same > * package resluts in TDX_OPERAND_BUSY. When creating > * multiple TDX guests simultaneously this can run > * concurrently. Take the per-package lock to > * serialize. > */ As pointed by Chao, those mutex will be dropped. https://lore.kernel.org/kvm/ZfpwIespKy8qxWWE@chao-email/ Also we would simplify cpu masks to track which package is online/offline, which cpu to use for each package somehow. > > + mutex_lock(&tdx_mng_key_config_lock[pkg]); > > + ret = smp_call_on_cpu(i, tdx_do_tdh_mng_key_config, > > + &kvm_tdx->tdr_pa, true); > > + mutex_unlock(&tdx_mng_key_config_lock[pkg]); > > + if (ret) > > + break; > > + } > > + cpus_read_unlock(); > > + free_cpumask_var(packages); > > + if (ret) { > > + i = 0; > > + goto teardown; > > + } > > + > > + kvm_tdx->tdcs_pa = tdcs_pa; > > + for (i = 0; i < tdx_info->nr_tdcs_pages; i++) { > > + err = tdh_mng_addcx(kvm_tdx->tdr_pa, tdcs_pa[i]); > > + if (err == TDX_RND_NO_ENTROPY) { > > + /* Here it's hard to allow userspace to retry. */ > > + ret = -EBUSY; > > + goto teardown; > > + } > > + if (WARN_ON_ONCE(err)) { > > + pr_tdx_error(TDH_MNG_ADDCX, err, NULL); > > + ret = -EIO; > > + goto teardown; > > + } > > + } > > + > > + /* > > + * Note, TDH_MNG_INIT cannot be invoked here. TDH_MNG_INIT requires a dedicated > > + * ioctl() to define the configure CPUID values for the TD. > > + */ > > Then, how about renaming this function to __tdx_td_create()? So do we want to rename also ioctl name for consistency? i.e. KVM_TDX_INIT_VM => KVM_TDX_CREATE_VM. I don't have strong opinion those names. Maybe KVM_TDX_{INIT, CREATE, or CONFIG}_VM? And we can rename the function name to match it. > > + return 0; > > + > > + /* > > + * The sequence for freeing resources from a partially initialized TD > > + * varies based on where in the initialization flow failure occurred. > > + * Simply use the full teardown and destroy, which naturally play nice > > + * with partial initialization. > > + */ > > +teardown: > > + for (; i < tdx_info->nr_tdcs_pages; i++) { > > + if (tdcs_pa[i]) { > > + free_page((unsigned long)__va(tdcs_pa[i])); > > + tdcs_pa[i] = 0; > > + } > > + } > > + if (!kvm_tdx->tdcs_pa) > > + kfree(tdcs_pa); > > The code to "free TDCS pages in a loop and free the array" is done below > with duplicated code. I am wondering whether we have way to eliminate one. > > But I have lost track here, so perhaps we can review again after we split > the patch to smaller pieces. Surely we can simplify it. Originally we had a spin lock and I had to separate blocking memory allocation from its usage with this error clean up path. Now it's mutex, we mix page allocation with its usage. > > + tdx_mmu_release_hkid(kvm); > > + tdx_vm_free(kvm); > > + return ret; > > + > > +free_packages: > > + cpus_read_unlock(); > > + free_cpumask_var(packages); > > +free_tdcs: > > + for (i = 0; i < tdx_info->nr_tdcs_pages; i++) { > > + if (tdcs_pa[i]) > > + free_page((unsigned long)__va(tdcs_pa[i])); > > + } > > + kfree(tdcs_pa); > > + kvm_tdx->tdcs_pa = NULL; > > + > > +free_tdr: > > + if (tdr_pa) > > + free_page((unsigned long)__va(tdr_pa)); > > + kvm_tdx->tdr_pa = 0; > > +free_hkid: > > + if (is_hkid_assigned(kvm_tdx)) > > + tdx_hkid_free(kvm_tdx); > > + return ret; > > +} > > + > > int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) > > { > > struct kvm_tdx_cmd tdx_cmd; > > @@ -215,12 +664,13 @@ static int tdx_md_read(struct tdx_md_map *maps, int nr_maps) > > static int __init tdx_module_setup(void) > > { > > - u16 num_cpuid_config; > > + u16 num_cpuid_config, tdcs_base_size; > > int ret; > > u32 i; > > struct tdx_md_map mds[] = { > > TDX_MD_MAP(NUM_CPUID_CONFIG, &num_cpuid_config), > > + TDX_MD_MAP(TDCS_BASE_SIZE, &tdcs_base_size), > > }; > > struct tdx_metadata_field_mapping fields[] = { > > @@ -273,6 +723,8 @@ static int __init tdx_module_setup(void) > > c->edx = ecx_edx >> 32; > > } > > + tdx_info->nr_tdcs_pages = tdcs_base_size / PAGE_SIZE; > > + > > Round up the 'tdcs_base_size' to make sure you have enough room, or put a > WARN() here if not page aligned? Ok, will add round up. Same for tdvps_base_size. I can't find about those sizes and page size in the TDX spec. Although TDH.MNG.ADDCX() and TDH.VP.ADDCX() imply that those sizes are multiple of PAGE SIZE, the spec doesn't guarantee it. I think silent round up is better than WARN() because we can do nothing about those values the TDX module provides. > > return 0; > > error_out: > > @@ -319,13 +771,27 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops) > > struct tdx_enabled enable = { > > .err = ATOMIC_INIT(0), > > }; > > + int max_pkgs; > > int r = 0; > > + int i; > > Nit: you can put the 3 into one line. > > > + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { > > + pr_warn("MOVDIR64B is reqiured for TDX\n"); > > It's better to make it more clear: > > "Disable TDX: MOVDIR64B is not supported or disabled by the kernel." > > Or, to match below: > > "Cannot enable TDX w/o MOVDIR64B". Ok. > > + return -EOPNOTSUPP; > > + } > > if (!enable_ept) { > > pr_warn("Cannot enable TDX with EPT disabled\n"); > > return -EINVAL; > > } > > + max_pkgs = topology_max_packages(); > > + tdx_mng_key_config_lock = kcalloc(max_pkgs, sizeof(*tdx_mng_key_config_lock), > > + GFP_KERNEL); > > + if (!tdx_mng_key_config_lock) > > + return -ENOMEM; > > + for (i = 0; i < max_pkgs; i++) > > + mutex_init(&tdx_mng_key_config_lock[i]); > > + > > Using a per-socket lock looks a little bit overkill to me. I don't know > whether we need to do in the initial version. Will leave to others. > > Please at least add a comment to explain this is for better performance when > creating multiple TDX guests IIUC? Will delete the mutex and simply the related logic. -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>