On Thu, Apr 28, 2022 at 6:40 PM Kai Huang <kai.huang@xxxxxxxxx> wrote: > > On Thu, 2022-04-28 at 12:58 +1200, Kai Huang wrote: > > On Wed, 2022-04-27 at 17:50 -0700, Dave Hansen wrote: > > > On 4/27/22 17:37, Kai Huang wrote: > > > > On Wed, 2022-04-27 at 14:59 -0700, Dave Hansen wrote: > > > > > In 5 years, if someone takes this code and runs it on Intel hardware > > > > > with memory hotplug, CPU hotplug, NVDIMMs *AND* TDX support, what happens? > > > > > > > > I thought we could document this in the documentation saying that this code can > > > > only work on TDX machines that don't have above capabilities (SPR for now). We > > > > can change the code and the documentation when we add the support of those > > > > features in the future, and update the documentation. > > > > > > > > If 5 years later someone takes this code, he/she should take a look at the > > > > documentation and figure out that he/she should choose a newer kernel if the > > > > machine support those features. > > > > > > > > I'll think about design solutions if above doesn't look good for you. > > > > > > No, it doesn't look good to me. > > > > > > You can't just say: > > > > > > /* > > > * This code will eat puppies if used on systems with hotplug. > > > */ > > > > > > and merrily await the puppy bloodbath. > > > > > > If it's not compatible, then you have to *MAKE* it not compatible in a > > > safe, controlled way. > > > > > > > > You can't just ignore the problems because they're not present on one > > > > > version of the hardware. > > > > > > Please, please read this again ^^ > > > > OK. I'll think about solutions and come back later. > > > > > Hi Dave, > > I think we have two approaches to handle memory hotplug interaction with the TDX > module initialization. > > The first approach is simple. We just block memory from being added as system > RAM managed by page allocator when the platform supports TDX [1]. It seems we > can add some arch-specific-check to __add_memory_resource() and reject the new > memory resource if platform supports TDX. __add_memory_resource() is called by > both __add_memory() and add_memory_driver_managed() so it prevents from adding > NVDIMM as system RAM and normal ACPI memory hotplug [2]. What if the memory being added *is* TDX capable? What if someone wanted to manage a memory range as soft-reserved and move it back and forth from the core-mm to device access. That should be perfectly acceptable as long as the memory is TDX capable. > The second approach is relatively more complicated. Instead of directly > rejecting the new memory resource in __add_memory_resource(), we check whether > the memory resource can be added based on CMR and the TDX module initialization > status. This is feasible as with the latest public P-SEAMLDR spec, we can get > CMR from P-SEAMLDR SEAMCALL[3]. So we can detect P-SEAMLDR and get CMR info > during kernel boots. And in __add_memory_resource() we do below check: > > tdx_init_disable(); /*similar to cpu_hotplug_disable() */ > if (tdx_module_initialized()) > // reject memory hotplug > else if (new_memory_resource NOT in CMRs) > // reject memory hotplug > else > allow memory hotplug > tdx_init_enable(); /*similar to cpu_hotplug_enable() */ > > tdx_init_disable() temporarily disables TDX module initialization by trying to > grab the mutex. If the TDX module initialization is already on going, then it > waits until it completes. > > This should work better for future platforms, but would requires non-trivial > more code as we need to add VMXON/VMXOFF support to the core-kernel to detect > CMR using SEAMCALL. A side advantage is with VMXON in core-kernel we can > shutdown the TDX module in kexec(). > > But for this series I think the second approach is overkill and we can choose to > use the first simple approach? This still sounds like it is trying to solve symptoms and not the root problem. Why must the core-mm never have non-TDX memory when VMs are fine to operate with either core-mm pages or memory from other sources like hugetlbfs and device-dax?