From: Dexuan Cui <decui@xxxxxxxxxxxxx> Sent: Friday, May 24, 2024 1:46 AM > > > From: Dave Hansen <dave.hansen@xxxxxxxxx> > > Sent: Thursday, May 23, 2024 7:26 AM > > [...] > > On 5/22/24 19:24, Dexuan Cui wrote: > > ... > > > +static bool noinstr intel_cc_platform_td_l2(enum cc_attr attr) > > > +{ > > > + switch (attr) { > > > + case CC_ATTR_GUEST_MEM_ENCRYPT: > > > + case CC_ATTR_MEM_ENCRYPT: > > > + return true; > > > + default: > > > + return false; > > > + } > > > +} > > > + > > > static bool noinstr intel_cc_platform_has(enum cc_attr attr) > > > { > > > + if (tdx_partitioned_td_l2) > > > + return intel_cc_platform_td_l2(attr); > > > + > > > switch (attr) { > > > case CC_ATTR_GUEST_UNROLL_STRING_IO: > > > case CC_ATTR_HOTPLUG_DISABLED: > > > > On its face, this _looks_ rather troubling. It just hijacks all of the > > attributes. It totally bifurcates the code. Anything that gets added > > to intel_cc_platform_has() now needs to be considered for addition to > > intel_cc_platform_td_l2(). > > Maybe the bifurcation is necessary? TD mode is different from > Partitioned TD mode (L2), after all. Another reason for the bifurcation > is: currently online/offline'ing is disallowed for a TD VM, but actually > Hyper-V is able to support CPU online/offline'ing for a TD VM in > Partitioned TD mode (L2) -- how can we allow online/offline'ing for such > a VM? > > BTW, the bifurcation code is copied from amd_cc_platform_has(), where > an AMD SNP VM may run in the vTOM mode. > > > > --- a/arch/x86/mm/mem_encrypt_amd.c > > > +++ b/arch/x86/mm/mem_encrypt_amd.c > > ... > > > @@ -529,7 +530,7 @@ void __init mem_encrypt_free_decrypted_mem(void) > > > * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a Hyper-V VM > > > * using vTOM, where sme_me_mask is always zero. > > > */ > > > - if (sme_me_mask) { > > > + if (sme_me_mask || (cc_vendor == CC_VENDOR_INTEL && !tdx_partitioned_td_l2)) { FWIW, the above won't work in a kernel built with CONFIG_TDX_GUEST=y but CONFIG_AMD_MEM_ENCRYPT=n. mem_encrypt_free_decrypted_mem() in arch/x86/mm/mem_encrypt_amd.c won't get built, and an empty stub is used. > > > r = set_memory_encrypted(vaddr, npages); > > > if (r) { > > > pr_warn("failed to free unused decrypted pages\n"); > > > > If _ever_ there were a place for a new CC_ attribute, this would be it. > Not sure how to add a new CC attribute for the __bss_decrypted support. > > For the cpu online/offline'ing support, I'm not sure how to add a new > CC attribute and not introduce the bifurcation. > > > It's also a bit concerning that now we've got a (cc_vendor == > > CC_VENDOR_INTEL) check in an amd.c file. > I agree my change here is ugly... > Currently the __bss_decrypted support is only used for SNP. > Not sure if we should get it to work for TDX as well. > > > So all of that on top of Kirill's "why do we need this in the first > > place" questions leave me really scratching my head on this one. > Probably I'll just use local APIC timer in such a VM or delay enabling > Hyper-V TSC page to a later place where set_memory_decrypted() > works for me. However, I still would like to find out how to allow > CPU online/offline'ing for a TDX VM in Partitioned TD mode (L2). > My thoughts: __bss_decrypted is named as if it applies to any CoCo VM, but really it is specific to AMD SEV. It was originally used for a GHCB page, which is SEV-specific, and then it proved to be convenient for the Hyper-V TSC page. Ideally, we could fix __bss_decrypted to work generally in a TDX VM without any dependency on code specific to a hypervisor. But looking at some of the details, that may be non-trivial. A narrower solution is to remove the Hyper-V TSC page from __bss_decrypted, and use Hyper-V specific code on both TDX and SEV-SNP to decrypt just that page (not the entire __bss_decrypted), based on whether the Hyper-V guest is running with a paravisor. >From Dexuan's patch, it looks like set_memory_decrypted() works on TDX at the time that ms_hyperv_init_platform() runs. Does it also work on SEV-SNP? The code in kvm_init_platform() uses early_set_mem_enc_dec_hypercall() with kvm_sev_hc_page_enc_status(), which is SEV only. So maybe the normal set_memory_decrypted() doesn't work on SEV at that point, though I'm not at all clear on what kvm_init_platform is trying to do. Shouldn't __bss_decrypted already be set up correctly? The issue of taking CPUs offline is separate. Is the inability to take a CPU offline with TDX an architectural limitation? Or just a current Linux implementation limitation? And what about in an L2 TDX VM? If the existence of a limitation in a L2 TDX VM is dependent on the hypervisor/paravisor, then can cc_platform_has() check some architectural flag (that's independent of the host hypervisor) to know if it is running in an L2 TDX VM and return false for CC_ATTR_HOTPLUG_DISABLED? If a host/paravisor combo doesn't allow taking a L2 TDX VM CPU offline, then it would be up to that combo to implement the appropriate restriction. It's not hard to add a CPUHP state that would prevent it. Michael