In TDX, to run a linux guest, TDs (hardware-isolated VMs) must accept before accessing private memory. Accessing private memory before acceptance is considered a fatal error and may result in the termination of the TD. The "accepting memory" operation in guest includes the following steps: - trigger a VM-exit - the host OS allocates a physical page and requests hardware to map the physical page to the GPA. - initialize memory content to 0. - encrypt the memory For a Linux guest, eagerly accepting all memory during kernel boot can slow down the boot process and cause unnecessary memory occupation on the host for pages that may never be accessed. Therefore, Linux guests usually opt for a lazy mode to delay page acceptance operations by not moving the pages to the buddy allocator's freelists. Instead, the kernel tracks memory in 4M units and places them in a zone->unaccepted_pages list if any page in the entire 4M range is in an unaccepted state (even if part of the memory range may have been accepted by firmware or the kernel). When the kernel does not have enough free pages, it will move memory from the zone->unaccepted_pages list and accept it, ensuring that the memory is accepted before moving it to the freelists and being available to the buddy allocator. The kexec segments' destination addresses are not allocated by the buddy allocator. Instead, they are searched from normal system RAM (top-down or bottom-up) and exclude driver-managed memory, ACPI, persistent, and reserved memory... Although these addresses may fall within the memory range managed by the buddy allocator (which must be in an accepted state), they could also be outside that range and in an unaccepted state. Since the kexec code will access the segments' destination addresses during the kexec process by swapping their content with the segments' source pages, it is necessary to accept the memory before performing the swap operations. Accept the destination addresses during the kexec load, immediately after they pass sanity checks. This ensures the code is located in a common place shared by both the kexec_load and kexec_file_load system calls. This will not conflict with the accounting in try_to_accept_memory_one() since the accounting is set during kernel boot and decremented when pages are moved to the freelists. There is no harm in invoking accept_memory() on a page before making it available to the buddy allocator. No need to worry about re-accepting memory since accept_memory() checks the unaccepted bitmap before accepting a memory page. Although a user may perform kexec loading without ever triggering the jump, it doesn't impact much since kexec loading is not in a performance-critical path. Additionally, the destination addresses are always searched and found in the same location on a given system. Changes to the destination address searching logic to locate only memory in either unaccepted or accepted status are unnecessary and complicated. Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Baoquan He <bhe@xxxxxxxxxx> --- kernel/kexec_core.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c0caa14880c3..f8eee0516bd9 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -210,6 +210,16 @@ int sanity_check_segment_list(struct kimage *image) } #endif + /* + * The destination addresses are searched from system RAM rather than + * being allocated from the buddy allocator, so they are not guaranteed + * to be accepted by the current kernel. Accept the destination + * addresses before kexec swaps their content with the segments' source + * pages to avoid accessing memory before it is accepted. + */ + for (i = 0; i < nr_segments; i++) + accept_memory(image->segment[i].mem, image->segment[i].memsz); + return 0; } -- 2.43.2