On Wed, 2022-05-04 at 07:31 -0700, Dan Williams wrote: > On Tue, May 3, 2022 at 4:59 PM Kai Huang <kai.huang@xxxxxxxxx> wrote: > > > > On Fri, 2022-04-29 at 13:40 +1200, Kai Huang wrote: > > > On Thu, 2022-04-28 at 12:58 +1200, Kai Huang wrote: > > > > On Wed, 2022-04-27 at 17:50 -0700, Dave Hansen wrote: > > > > > On 4/27/22 17:37, Kai Huang wrote: > > > > > > On Wed, 2022-04-27 at 14:59 -0700, Dave Hansen wrote: > > > > > > > In 5 years, if someone takes this code and runs it on Intel hardware > > > > > > > with memory hotplug, CPU hotplug, NVDIMMs *AND* TDX support, what happens? > > > > > > > > > > > > I thought we could document this in the documentation saying that this code can > > > > > > only work on TDX machines that don't have above capabilities (SPR for now). We > > > > > > can change the code and the documentation when we add the support of those > > > > > > features in the future, and update the documentation. > > > > > > > > > > > > If 5 years later someone takes this code, he/she should take a look at the > > > > > > documentation and figure out that he/she should choose a newer kernel if the > > > > > > machine support those features. > > > > > > > > > > > > I'll think about design solutions if above doesn't look good for you. > > > > > > > > > > No, it doesn't look good to me. > > > > > > > > > > You can't just say: > > > > > > > > > > /* > > > > > * This code will eat puppies if used on systems with hotplug. > > > > > */ > > > > > > > > > > and merrily await the puppy bloodbath. > > > > > > > > > > If it's not compatible, then you have to *MAKE* it not compatible in a > > > > > safe, controlled way. > > > > > > > > > > > > You can't just ignore the problems because they're not present on one > > > > > > > version of the hardware. > > > > > > > > > > Please, please read this again ^^ > > > > > > > > OK. I'll think about solutions and come back later. > > > > > > > > > > > Hi Dave, > > > > > > I think we have two approaches to handle memory hotplug interaction with the TDX > > > module initialization. > > > > > > The first approach is simple. We just block memory from being added as system > > > RAM managed by page allocator when the platform supports TDX [1]. It seems we > > > can add some arch-specific-check to __add_memory_resource() and reject the new > > > memory resource if platform supports TDX. __add_memory_resource() is called by > > > both __add_memory() and add_memory_driver_managed() so it prevents from adding > > > NVDIMM as system RAM and normal ACPI memory hotplug [2]. > > > > Hi Dave, > > > > Try to close how to handle memory hotplug. Any comments to below? > > > > For the first approach, I forgot to think about memory hot-remove case. If we > > just reject adding new memory resource when TDX is capable on the platform, then > > if the memory is hot-removed, we won't be able to add it back. My thinking is > > we still want to support memory online/offline because it is purely in software > > but has nothing to do with TDX. But if one memory resource can be put to > > offline, it seems we don't have any way to prevent it from being removed. > > > > So if we do above, on the future platforms when memory hotplug can co-exist with > > TDX, ACPI hot-add and kmem-hot-add memory will be prevented. However if some > > memory is hot-removed, it won't be able to be added back (even it is included in > > CMR, or TDMRs after TDX module is initialized). > > > > Is this behavior acceptable? Or perhaps I have misunderstanding? > > Memory online at boot uses similar kernel paths as memory-online at > run time, so it sounds like your question is confusing physical vs > logical remove. Consider the case of logical offline then re-online > where the proposed TDX sanity check blocks the memory online, but then > a new kernel is kexec'd and that kernel again trusts the memory as TD > convertible again just because it onlines the memory in the boot path. > For physical memory remove it seems up to the platform to block that > if it conflicts with TDX, not for the kernel to add extra assumptions > that logical offline / online is incompatible with TDX. Hi Dan, No we don't block memory online, but we block memory add. The code I mentioned is add_memory_resource(), while memory online code path is memory_block_online(). Or am I wrong? -- Thanks, -Kai