Hi, Oscar, Oscar Salvador <osalvador@xxxxxxx> writes: > On Tue, Oct 22, 2024 at 11:16:17AM +0800, Huang Ying wrote: >> On systems with TDX (Trust Domain eXtensions) enabled, current kernel >> checks the TDX compatibility of the hot-added memory ranges through a >> memory hotplug notifier for each memory_block. If a memory range >> which isn't TDX compatible is hot-added, for example, some CXL memory, >> the command line as follows, >> >> $ echo 1 > /sys/devices/system/node/nodeX/memoryY/online >> >> will report something like, >> >> bash: echo: write error: Operation not permitted >> >> If pr_debug() is enabled, current kernel will show the error message >> like below in the kernel log, >> >> online_pages [mem 0xXXXXXXXXXX-0xXXXXXXXXXX] failed >> >> Both are too general to root cause the problem. This may confuse >> users. One solution is to print some error messages in the TDX memory >> hotplug notifier. However, kernel calls memory hotplug notifiers for >> each memory block, so this may lead to a large volume of messages in >> the kernel log if a large number of memory blocks are onlined with a >> script or automatically. For example, the typical size of memory >> block is 128MB on x86_64, when online 64GB CXL memory, 512 messages >> will be logged. >> >> Therefore, this patch checks the TDX compatibility of the whole >> hot-adding memory range through a newly added architecture specific >> function (arch_check_hotplug_memory_range()). If this patch rejects >> the memory hot-adding for TDX compatibility, it will output a kernel >> log message like below, >> >> virt/tdx: Reject hot-adding memory range: 0xXXXXXXXX-0xXXXXXXXX for TDX compatibility. >> >> The target use case is to support CXL memory on TDX enabled systems. >> If the CXL memory isn't compatible with TDX, the kernel will reject >> the whole CXL memory range. While the CXL memory can still be used >> via devdax interface. >> >> This also makes the original TDX memory hotplug notifier useless, so >> this patch deletes it. >> >> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> > > Acked-by: Oscar Salvador <osalvador@xxxxxxx> Thanks! > One question below: > > ... > >> +int tdx_check_hotplug_memory_range(u64 start, u64 size) >> { >> - struct memory_notify *mn = v; >> - >> - if (action != MEM_GOING_ONLINE) >> - return NOTIFY_OK; >> + u64 start_pfn = PHYS_PFN(start); >> + u64 end_pfn = PHYS_PFN(start + size); >> >> /* >> * Empty list means TDX isn't enabled. Allow any memory >> - * to go online. >> + * to be hot-added. >> */ >> if (list_empty(&tdx_memlist)) >> - return NOTIFY_OK; >> + return 0; >> >> /* >> * The TDX memory configuration is static and can not be >> - * changed. Reject onlining any memory which is outside of >> + * changed. Reject hot-adding any memory which is outside of >> * the static configuration whether it supports TDX or not. >> */ >> - if (is_tdx_memory(mn->start_pfn, mn->start_pfn + mn->nr_pages)) >> - return NOTIFY_OK; >> + if (is_tdx_memory(start_pfn, end_pfn)) >> + return 0; >> >> - return NOTIFY_BAD; >> + pr_info("Reject hot-adding memory range: %#llx-%#llx for TDX compatibility.\n", >> + start, start + size); > > Why not using pr_err() here? > > I was checking which kind of information level we use when failing at > hot-adding memory, and we seem to be using pr_err(), and pr_debug() when > onlining/offlining. > > Not a big deal, and not saying it is wrong, but was just wondering the reasoning > behind. TBH, I have no strong opinion about which log level is more appropriate. IMHO, it shouldn't be pr_debug() to make it easy for users to root cause the hot-adding failure. And, it appears too harsh to use pr_err(), because there's no program error, etc. So, I think that something in-between is more appropriate. That is, pr_warn(), pr_notice, or pr_info(). In them, I prefer pr_info() a little. -- Best Regards, Huang, Ying