On Tue, May 16, 2023 at 02:44:00PM -0500, Tom Lendacky wrote: > On 5/13/23 17:04, Kirill A. Shutemov wrote: > > UEFI Specification version 2.9 introduces the concept of memory > > acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD > > SEV-SNP, require memory to be accepted before it can be used by the > > guest. Accepting happens via a protocol specific to the Virtual Machine > > platform. > > > > There are several ways kernel can deal with unaccepted memory: > > > > 1. Accept all the memory during the boot. It is easy to implement and > > it doesn't have runtime cost once the system is booted. The downside > > is very long boot time. > > > > Accept can be parallelized to multiple CPUs to keep it manageable > > (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate > > memory bandwidth and does not scale beyond the point. > > > > 2. Accept a block of memory on the first use. It requires more > > infrastructure and changes in page allocator to make it work, but > > it provides good boot time. > > > > On-demand memory accept means latency spikes every time kernel steps > > onto a new memory block. The spikes will go away once workload data > > set size gets stabilized or all memory gets accepted. > > > > 3. Accept all memory in background. Introduce a thread (or multiple) > > that gets memory accepted proactively. It will minimize time the > > system experience latency spikes on memory allocation while keeping > > low boot time. > > > > This approach cannot function on its own. It is an extension of #2: > > background memory acceptance requires functional scheduler, but the > > page allocator may need to tap into unaccepted memory before that. > > > > The downside of the approach is that these threads also steal CPU > > cycles and memory bandwidth from the user's workload and may hurt > > user experience. > > > > The patch implements #1 and #2 for now. #2 is the default. Some > > workloads may want to use #1 with accept_memory=eager in kernel > > command line. #3 can be implemented later based on user's demands. > > > > Support of unaccepted memory requires a few changes in core-mm code: > > > > - memblock has to accept memory on allocation; > > > > - page allocator has to accept memory on the first allocation of the > > page; > > > > Memblock change is trivial. > > > > The page allocator is modified to accept pages. New memory gets accepted > > before putting pages on free lists. It is done lazily: only accept new > > pages when we run out of already accepted memory. The memory gets > > accepted until the high watermark is reached. > > > > EFI code will provide two helpers if the platform supports unaccepted > > memory: > > > > - accept_memory() makes a range of physical addresses accepted. > > > > - range_contains_unaccepted_memory() checks anything within the range > > of physical addresses requires acceptance. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > Acked-by: Mike Rapoport <rppt@xxxxxxxxxxxxx> # memblock > > Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> > > --- > > drivers/base/node.c | 7 ++ > > fs/proc/meminfo.c | 5 ++ > > include/linux/mm.h | 19 +++++ > > include/linux/mmzone.h | 8 ++ > > mm/internal.h | 1 + > > mm/memblock.c | 9 +++ > > mm/mm_init.c | 7 ++ > > mm/page_alloc.c | 173 +++++++++++++++++++++++++++++++++++++++++ > > mm/vmstat.c | 3 + > > 9 files changed, 232 insertions(+) > > > > > diff --git a/mm/internal.h b/mm/internal.h > > index 68410c6d97ac..b1db7ba5f57d 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -1099,4 +1099,5 @@ struct vma_prepare { > > struct vm_area_struct *remove; > > struct vm_area_struct *remove2; > > }; > > + > > Looks like an unintentional change. Yep, will fix. > > #endif /* __MM_INTERNAL_H */ > > diff --git a/mm/memblock.c b/mm/memblock.c > > index 3feafea06ab2..50b921119600 100644 > > --- a/mm/memblock.c > > +++ b/mm/memblock.c > > @@ -1436,6 +1436,15 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, > > */ > > kmemleak_alloc_phys(found, size, 0); > > + /* > > + * Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, > > + * require memory to be accepted before it can be used by the > > + * guest. > > + * > > + * Accept the memory of the allocated buffer. > > + */ > > + accept_memory(found, found + size); > > I'm not an mm or memblock expert, but do we need to worry about freed memory > from memblock_phys_free() being possibly doubly accepted? A double > acceptance will trigger a guest termination on SNP. There will be no double acceptance. accept_memory() will consult the bitmap before accepting any memory. For already accepted memory it is a nop. -- Kiryl Shutsemau / Kirill A. Shutemov