On Wed, 2022-10-26 at 16:26 -0700, Dave Hansen wrote: > On 10/26/22 16:15, Kai Huang wrote: > > To keep things simple, this series doesn't handle memory hotplug at all, > > but depends on the machine owner to not do any memory hotplug operation. > > For exmaple, the machine owner should not plug any NVDIMM and CXL memory > > into the machine, or use kmem driver to plug NVDIMM or CXL memory to the > > core-mm. > > > > This will be enhanced in the future after first submission. We are also > > looking into options on how to handle: > > This is also known as the "hopes and prayers" approach to software > enabling. "Let's just hope and pray that nobody does these things which > we know are broken." > > In the spirit of moving this submission forward, I'm willing to continue > to _review_ this series. > Thank you Dave! > But, I don't think it can go upstream until it > contains at least _some_ way to handle memory hotplug. > > Yes I agree. One intention of sending out this series is actually to hear feedbacks on how to handle. As mentioned in the cover letter, AFAICT we have two options: 1) to enforce the kernel to always guarantee all pages in the page allocator are TDX memory (i.e. via rejecting non-TDX memory in memory hotplug). Non-TDX memory can be used via devdax. 2) to manage TDX and non-TDX memory in different NUMA nodes, and use per-node TDX memory capability flag to show which nodes are TDX-capable. Userspace needs to explicitly bind TDX guests to those TDX-capable NUMA nodes. I think the important thing is we need to get consensus on which direction to go as this is kinda related to userspace ABI AFAICT. Kirill has some thoughts on the second option, such as we may need some additional work to split NUMA node which contains both TDX and non-TDX memory. I am not entirely clear how hard this work will be, but my thinking is, the above two are not necessarily conflicting. For example, from userspace ABI's perspective we can go option 2, but at the meantime, we still reject hotplug of non-TDX memory. This effectively equals to reporting all nodes as TDX-capable. Splitting NUMA nodes which contains both TDX and non-TDX memory can be enhanced in the future as it doesn't break userspace ABI -- userspace needs to explicitly bind TDX guests to TDX-capable nodes anyway.