... adding some KVM/TDX folks On 5/6/22 12:02, Boris Petkov wrote: >> This node attribute punts the problem back out to userspace. It >> gives userspace the ability to steer allocations to compatible NUMA >> nodes. If something goes wrong, they can use other NUMA ABIs to >> inspect the situation, like /proc/$pid/numa_maps. > That's all fine and dandy but I still don't see the *actual*, > real-life use case of why something would request memory of > particular encryption capabilities. Don't get me wrong - I'm not > saying there are not such use cases - I'm saying we should go all the > way and fully define properly *why* we're doing this whole hoopla. Let's say TDX is running on a system with mixed encryption capabilities*. Some NUMA nodes support TDX and some don't. If that happens, your guest RAM can come from anywhere. When the host kernel calls into the TDX module to add pages to the guest (via TDH.MEM.PAGE.ADD) it might get an error back from the TDX module. At that point, the host kernel is stuck. It's got a partially created guest and no recourse to fix the error. This new ABI provides a way to avoid that situation in the first place. Userspace can look at sysfs to figure out which NUMA nodes support "encryption" (aka. TDX) and can use the existing NUMA policy ABI to avoid TDH.MEM.PAGE.ADD failures. So, here's the question for the TDX folks: are these mixed-capability systems a problem for you? Does this ABI help you fix the problem? Will your userspace (qemu and friends) actually use consume from this ABI? * There are three ways we might hit a system with this issue: 1. NVDIMMs that don't support TDX, like lack of memory integrity protection. 2. CXL-attached memory controllers that can't do encryption at all 3. Nominally TDX-compatible memory that was not covered/converted by the kernel for some reason (memory hot-add, or ran out of TDMR resources)