On 7/20/22 03:18, Kai Huang wrote: > Try to close on how to handle memory hotplug. After discussion, below will be > architectural behaviour of TDX in terms of ACPI memory hotplug: > > 1) During platform boot, CMRs must be physically present. MCHECK verifies all > CMRs are physically present and are actually TDX convertible memory. I doubt this is strictly true. This makes it sound like MCHECK is doing *ACTUAL* verification that the memory is, in practice, convertible. That would mean actually writing to it, which would take a long time for a large system. Does it *ACTUALLY* verify this? Also, it's very odd to say that "CMRs must be physically present". A CMR itself is a logical construct. The physical memory *backing* a CMR is, something else entirely. > 2) CMRs are static after platform boots and don't change at runtime. TDX > architecture doesn't support hot-add or hot-removal of CMR memory. > 3) TDX architecture doesn't forbid non-CMR memory hotplug. > > Also, although TDX doesn't trust BIOS in terms of security, a non-buggy BIOS > should prevent CMR memory from being hot-removed. If kernel ever receives such > event, it's a BIOS bug, or even worse, the BIOS is compromised and under attack. > > As a result, the kernel should also never receive event of hot-add CMR memory. > It is very much likely TDX is under attack (physical attack) in such case, i.e. > someone is trying to physically replace any CMR memory. > > In terms of how to handle ACPI memory hotplug, my thinking is -- ideally, if the > kernel can get the CMRs during kernel boot when detecting whether TDX is enabled > by BIOS, we can do below: > > - For memory hot-removal, if the removed memory falls into any CMR, then kernel > can speak loudly it is a BIOS bug. But when this happens, the hot-removal has > been handled by BIOS thus kernel cannot actually prevent, so kernel can either > BUG(), or just print error message. If the removed memory doesn't fall into > CMR, we do nothing. Hold on a sec. Hot-removal is a two-step process. The kernel *MUST* know in advance that the removal is going to occur. It follows that up with evacuating the memory, giving the "all clear", then the actual physical removal can occur. I'm not sure what you're getting at with the "kernel cannot actually prevent" bit. No sane system actively destroys perfect good memory content and tells the kernel about it after the fact. > - For memory hot-add, if the new memory falls into any CMR, then kernel should > speak loudly it is a BIOS bug, or even say "TDX is under attack" as this is only > possible when CMR memory has been previously hot-removed. I don't think this is strictly true. It's totally possible to get a hot-add *event* for memory which is in a CMR. It would be another BIOS bug, of course, but hot-remove is not a prerequisite purely for an event. > And kernel should > reject the new memory for security reason. If the new memory doesn't fall into > any CMR, then we (also) just reject the new memory, as we want to guarantee all > memory in page allocator are TDX pages. But this is basically due to kernel > policy but not due to TDX architecture. Agreed. > BUT, since as the first step, we cannot get the CMR during kernel boot (as it > requires additional code to put CPU into VMX operation), I think for now we can > handle ACPI memory hotplug in below way: > > - For memory hot-removal, we do nothing. This doesn't seem right to me. *If* we get a known-bogus hot-remove event, we need to reject it. Remember, removal is a two-step process. > - For memory hot-add, we simply reject the new memory when TDX is enabled by > BIOS. This not only prevents the potential "physical attack of replacing any > CMR memory", I don't think there's *any* meaningful attack mitigation here. Even if someone managed to replace the physical address space that backed some private memory, the integrity checksums won't match. Memory integrity mitigates physical replacement, not software. > but also makes sure no non-CMR memory will be added to page > allocator during runtime via ACPI memory hot-add. Agreed. This one _is_ important and since it supports an existing policy, it makes sense to enforce this in the kernel. > We can improve this in next stage when we can get CMRs during kernel boot. > > For the concern that on a TDX BIOS enabled system, people may not want to use > TDX at all but just use it as normal system, as I replied to Dan regarding to > the driver-managed memory hotplug, we can provide a kernel commandline, i.e. > use_tdx={on|off}, to allow user to *choose* between TDX and memory hotplug. > When use_tdx=off, we continue to allow memory hotplug and driver-managed hotplug > as normal but refuse to initialize TDX module. That doesn't sound like a good resolution to me. It conflates pure "software" hotplug operations like transitioning memory ownership from the core mm to a driver (like device DAX). TDX should not have *ANY* impact on purely software operations. Period.