When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf() allocates buffers using vzalloc(), and needs to share the buffers with the host OS by calling set_memory_decrypted(), which is not working for vmalloc() yet. Add the support by handling the pages one by one. Co-developed-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Reviewed-by: Michael Kelley <mikelley@xxxxxxxxxxxxx> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> --- arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++++++------------- 1 file changed, 52 insertions(+), 24 deletions(-) Changes in v2: Changed tdx_enc_status_changed() in place. Hi, Dave, I checked the huge vmalloc mapping code, but still don't know how to get the underlying huge page info (if huge page is in use) and try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked is_vm_area_hugepages() and __vfree() -> __vunmap(), and I think the underlying page allocation info is internal to the mm code, and there is no mm API to for me get the info in tdx_enc_status_changed(). Changes in v3: No change since v2. Changes in v4: Added Kirill's Co-developed-by since Kirill helped to improve the code by adding tdx_enc_status_changed_phys(). Thanks Kirill for the clarification on load_unaligned_zeropad()! The vzalloc() usage in drivers/net/hyperv/netvsc.c: netvsc_init_buf() remains the same. It may not worth it to "allocate a vmalloc region, allocate pages manually", because we have to consider the worst case where the system is sufferiing from severe memory fragmentation and we can only allocate multiple single pages. We may not want to complicate the code in netvsc_init_buf(). We'll support NIC SR-IOV for TDX VMs on Hyper-V, so the netvsc send/recv buffers won't be used when the VF NIC is up. Changes in v5: Added Kirill's Signed-off-by. Added Michael's Reviewed-by. diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 5574c91541a2..731be50b3d09 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -7,6 +7,7 @@ #include <linux/cpufeature.h> #include <linux/export.h> #include <linux/io.h> +#include <linux/mm.h> #include <asm/coco.h> #include <asm/tdx.h> #include <asm/vmx.h> @@ -789,6 +790,34 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len, return true; } +static bool try_accept_page(phys_addr_t start, phys_addr_t end) +{ + /* + * For shared->private conversion, accept the page using + * TDX_ACCEPT_PAGE TDX module call. + */ + while (start < end) { + unsigned long len = end - start; + + /* + * Try larger accepts first. It gives chance to VMM to keep + * 1G/2M SEPT entries where possible and speeds up process by + * cutting number of hypercalls (if successful). + */ + + if (try_accept_one(&start, len, PG_LEVEL_1G)) + continue; + + if (try_accept_one(&start, len, PG_LEVEL_2M)) + continue; + + if (!try_accept_one(&start, len, PG_LEVEL_4K)) + return false; + } + + return true; +} + /* * Notify the VMM about page mapping conversion. More info about ABI * can be found in TDX Guest-Host-Communication Interface (GHCI), @@ -838,6 +867,19 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc) return !ret; } +static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end, + bool enc) +{ + if (!tdx_map_gpa(start, end, enc)) + return false; + + /* private->shared conversion requires only MapGPA call */ + if (!enc) + return true; + + return try_accept_page(start, end); +} + /* * Inform the VMM of the guest's intent for this physical page: shared with * the VMM or private to the guest. The VMM is expected to change its mapping @@ -845,37 +887,23 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc) */ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc) { - phys_addr_t start = __pa(vaddr); - phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE); + unsigned long start = vaddr; + unsigned long end = start + numpages * PAGE_SIZE; - if (!tdx_map_gpa(start, end, enc)) + if (offset_in_page(start) != 0) return false; - /* private->shared conversion requires only MapGPA call */ - if (!enc) - return true; + if (!is_vmalloc_addr((void *)start)) + return tdx_enc_status_changed_phys(__pa(start), __pa(end), enc); - /* - * For shared->private conversion, accept the page using - * TDX_ACCEPT_PAGE TDX module call. - */ while (start < end) { - unsigned long len = end - start; + phys_addr_t start_pa = slow_virt_to_phys((void *)start); + phys_addr_t end_pa = start_pa + PAGE_SIZE; - /* - * Try larger accepts first. It gives chance to VMM to keep - * 1G/2M SEPT entries where possible and speeds up process by - * cutting number of hypercalls (if successful). - */ - - if (try_accept_one(&start, len, PG_LEVEL_1G)) - continue; - - if (try_accept_one(&start, len, PG_LEVEL_2M)) - continue; - - if (!try_accept_one(&start, len, PG_LEVEL_4K)) + if (!tdx_enc_status_changed_phys(start_pa, end_pa, enc)) return false; + + start += PAGE_SIZE; } return true; -- 2.25.1