On Fri, 2022-07-08 at 11:34 +1200, Kai Huang wrote: > > Why not just entirely remove the lower 1MB from the memblock structure > > on TDX systems? Do something equivalent to adding this on the kernel > > command line: > > > > memmap=1M$0x0 > > I will explore this option. Thanks! Hi Dave, After investigating and testing, we cannot simply remove first 1MB from e820 table which is similar to what 'memmap=1M$0x0' does, as the kernel needs low memory as trampoline to bring up all APs. Currently I am doing below: --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -65,6 +65,17 @@ void __init reserve_real_mode(void) * setup_arch(). */ memblock_reserve(0, SZ_1M); + + /* + * As one step of initializing the TDX module (on-demand), the + * kernel will later verify all memory regions in memblock are + * truly TDX-capable and convert all of them to TDX memory. + * The first 1MB may not be enumerated as TDX-capable memory. + * To avoid failure to verify, explicitly remove the first 1MB + * from memblock for a TDX (BIOS) enabled system. + */ + if (platform_tdx_enabled()) + memblock_remove(0, SZ_1M); I tested an it worked (I didn't observe any problem), but am I missing something? Also, regarding to whether we can remove platform_tdx_enabled() at all, I looked into the spec again and there's no MSR or CPUID from which we can check TDX is enabled by BIOS -- except checking the SEAMRR_MASK MSR, which is basically platform_tdx_enabled() also did. Checking MSR_MTRRcap.SEAMRR bit isn't enough as it will be true as long as the hardware supports SEAMRR, but it doesn't tell whether SEAMRR(TDX) is enabled by BIOS. So if above code is reasonable, I think we can still detect TDX during boot and keep platform_tdx_enabled(). It also detects TDX KeyIDs, which isn't necessary for removing the first 1MB here (nor for kexec() support), but detecting TDX KeyIDs must be done anyway either during kernel boot or during initializing TDX module. Detecting TDX KeyID at boot time also has an advantage that in the future we can expose KeyIDs via /sysfs and userspace can know how many TDs the machine can support w/o having to initializing the TDX module first (we received such requirement from customer but yes it is arguable). Any comments? -- Thanks, -Kai