On 05.08.22 13:49, Vlastimil Babka wrote: > On 6/14/22 14:02, Kirill A. Shutemov wrote: >> UEFI Specification version 2.9 introduces the concept of memory >> acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD >> SEV-SNP, require memory to be accepted before it can be used by the >> guest. Accepting happens via a protocol specific to the Virtual Machine >> platform. >> >> There are several ways kernel can deal with unaccepted memory: >> >> 1. Accept all the memory during the boot. It is easy to implement and >> it doesn't have runtime cost once the system is booted. The downside >> is very long boot time. >> >> Accept can be parallelized to multiple CPUs to keep it manageable >> (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate >> memory bandwidth and does not scale beyond the point. >> >> 2. Accept a block of memory on the first use. It requires more >> infrastructure and changes in page allocator to make it work, but >> it provides good boot time. >> >> On-demand memory accept means latency spikes every time kernel steps >> onto a new memory block. The spikes will go away once workload data >> set size gets stabilized or all memory gets accepted. >> >> 3. Accept all memory in background. Introduce a thread (or multiple) >> that gets memory accepted proactively. It will minimize time the >> system experience latency spikes on memory allocation while keeping >> low boot time. >> >> This approach cannot function on its own. It is an extension of #2: >> background memory acceptance requires functional scheduler, but the >> page allocator may need to tap into unaccepted memory before that. >> >> The downside of the approach is that these threads also steal CPU >> cycles and memory bandwidth from the user's workload and may hurt >> user experience. >> >> Implement #2 for now. It is a reasonable default. Some workloads may >> want to use #1 or #3 and they can be implemented later based on user's >> demands. >> >> Support of unaccepted memory requires a few changes in core-mm code: >> >> - memblock has to accept memory on allocation; >> >> - page allocator has to accept memory on the first allocation of the >> page; >> >> Memblock change is trivial. >> >> The page allocator is modified to accept pages on the first allocation. >> The new page type (encoded in the _mapcount) -- PageUnaccepted() -- is >> used to indicate that the page requires acceptance. >> >> Architecture has to provide two helpers if it wants to support >> unaccepted memory: >> >> - accept_memory() makes a range of physical addresses accepted. >> >> - range_contains_unaccepted_memory() checks anything within the range >> of physical addresses requires acceptance. >> >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> >> Acked-by: Mike Rapoport <rppt@xxxxxxxxxxxxx> # memblock >> Reviewed-by: David Hildenbrand <david@xxxxxxxxxx> > > Hmm I realize it's not ideal to raise this at v7, and maybe it was discussed > before, but it's really not great how this affects the core page allocator > paths. Wouldn't it be possible to only release pages to page allocator when > accepted, and otherwise use some new per-zone variables together with the > bitmap to track how much exactly is where to accept? Then it could be hooked > in get_page_from_freelist() similarly to CONFIG_DEFERRED_STRUCT_PAGE_INIT - > if we fail zone_watermark_fast() and there are unaccepted pages in the zone, > accept them and continue. With a static key to flip in case we eventually > accept everything. Because this is really similar scenario to the deferred > init and that one was solved in a way that adds minimal overhead. I kind of like just having the memory stats being correct (e.g., free memory) and acceptance being an internal detail to be triggered when allocating pages -- just like the arch_alloc_page() callback. I'm sure we could optimize for the !unaccepted memory via static keys also in this version with some checks at the right places if we find this to hurt performance? -- Thanks, David / dhildenb