Don't set *this_cpu_ptr(hyperv_pcpu_input_arg) before the function set_memory_decrypted() returns, otherwise we run into this ticky issue: For a fully enlightened TDX/SNP VM, in hv_common_cpu_init(), *this_cpu_ptr(hyperv_pcpu_input_arg) is an encrypted page before the set_memory_decrypted() returns. When such a VM has more than 64 VPs, if the hyperv_pcpu_input_arg is not NULL, hv_common_cpu_init() -> set_memory_decrypted() -> ... -> cpa_flush() -> on_each_cpu() -> ... -> hv_send_ipi_mask() -> ... -> __send_ipi_mask_ex() tries to call hv_do_fast_hypercall16() with the hyperv_pcpu_input_arg as the hypercall input page, which must be a decrypted page in such a VM, but the page is still encrypted at this point, and a fatal fault is triggered. Fix the issue by setting *this_cpu_ptr(hyperv_pcpu_input_arg) after set_memory_decrypted(): if the hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns HV_STATUS_INVALID_PARAMETER immediately, and hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(), which can use x2apic_send_IPI_all(), which may be slightly slower than the hypercall but still works correctly in such a VM. Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> --- Changes in v2: None drivers/hv/hv_common.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 897bbb96f4118..4c858e1636da7 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -360,6 +360,7 @@ int hv_common_cpu_init(unsigned int cpu) u64 msr_vp_index; gfp_t flags; int pgcount = hv_root_partition ? 2 : 1; + void *mem; int ret; /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */ @@ -372,25 +373,40 @@ int hv_common_cpu_init(unsigned int cpu) * allocated if this CPU was previously online and then taken offline */ if (!*inputarg) { - *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags); - if (!(*inputarg)) + mem = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags); + if (!mem) return -ENOMEM; if (hv_root_partition) { outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); - *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE; + *outputarg = (char *)mem + HV_HYP_PAGE_SIZE; } if (hv_isolation_type_en_snp() || hv_isolation_type_tdx()) { - ret = set_memory_decrypted((unsigned long)*inputarg, pgcount); + ret = set_memory_decrypted((unsigned long)mem, pgcount); if (ret) { - /* It may be unsafe to free *inputarg */ - *inputarg = NULL; + /* It may be unsafe to free 'mem' */ return ret; } - memset(*inputarg, 0x00, pgcount * PAGE_SIZE); + memset(mem, 0x00, pgcount * HV_HYP_PAGE_SIZE); } + + /* + * In a fully enlightened TDX/SNP VM with more than 64 VPs, if + * hyperv_pcpu_input_arg is not NULL, set_memory_decrypted() -> + * ... -> cpa_flush()-> ... -> __send_ipi_mask_ex() tries to + * use hyperv_pcpu_input_arg as the hypercall input page, which + * must be a decrypted page in such a VM, but the page is still + * encrypted before set_memory_decrypted() returns. Fix this by + * setting *inputarg after the above set_memory_decrypted(): if + * hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns + * HV_STATUS_INVALID_PARAMETER immediately, and the function + * hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(), + * which may be slightly slower than the hypercall, but still + * works correctly in such a VM. + */ + *inputarg = mem; } msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX); -- 2.25.1