On Tue, Jul 04, 2023 at 05:37:40PM +0300, Kirill A. Shutemov wrote: > On Mon, Jul 03, 2023 at 02:25:18PM +0100, Mel Gorman wrote: > > On Tue, Jun 06, 2023 at 05:26:33PM +0300, Kirill A. Shutemov wrote: > > > efi_config_parse_tables() reserves memory that holds unaccepted memory > > > configuration table so it won't be reused by page allocator. > > > > > > Core-mm requires few helpers to support unaccepted memory: > > > > > > - accept_memory() checks the range of addresses against the bitmap and > > > accept memory if needed. > > > > > > - range_contains_unaccepted_memory() checks if anything within the > > > range requires acceptance. > > > > > > Architectural code has to provide efi_get_unaccepted_table() that > > > returns pointer to the unaccepted memory configuration table. > > > > > > arch_accept_memory() handles arch-specific part of memory acceptance. > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > > Reviewed-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > > > Reviewed-by: Tom Lendacky <thomas.lendacky@xxxxxxx> > > > > By and large, this looks ok from the page allocator perspective as the > > checks for unaccepted are mostly after watermark checks. However, if you > > look in the initial fast path, you'll see this > > > > /* > > * Forbid the first pass from falling back to types that fragment > > * memory until all local zones are considered. > > */ > > alloc_flags |= alloc_flags_nofragment(ac.preferred_zoneref->zone, gfp); > > > > While checking watermarks should be fine from a functional perspective and > > the fast paths are unaffected, there is a risk of premature fragmentation > > until all memory has been accepted. Meeting watermarks does not necessarily > > mean that fragmentation is avoided as pageblocks can get mixed while still > > meeting watermarks. > > Could you elaborate on this scenario? > > Current code checks the watermark, if it is met, try rmqueue(). > > If rmqueue() fails anyway, try to accept more pages and retry the zone if > it is successful. > > I'm not sure how we can get to the 'if (no_fallback) {' case with any > unaccepted memory in the allowed zones. > Lets take an extreme example and assume that the low watermark is lower than 2MB (one pageblock). Just before the watermark is reached (free count between 1MB and 2MB), it is unlikely that all free pages are within pageblocks of the same migratetype (e.g. MIGRATE_MOVABLE). If there is an allocation near the watermark of a different type (e.g. MIGRATE_UNMOVABLE) then the page allocation could fallback to a different pageblock and now it is mixed. It's a condition that is only obvious if you are explicitly checking for it via tracepoints. This can happen in the normal case, but unaccepted memory makes it worse because the "pageblock mixing" could have been avoided if the "no_fallback" case accepted at least one new pageblock instead of mixing pageblocks. That is an extreme example but the same logic applies when the free count is at or near MIGRATE_TYPES*pageblock_nr_pages as it is not guaranteed that the pageblocks with free pages are a migratetype that matches the allocation request. Hence, it may be more robust from a fragmentation perspective if ALLOC_NOFRAGMENT requests accept memory if it is available and retries before clearing ALLOC_NOFRAGMENT and mixing pageblocks before the watermarks are reached. > I see that there's preferred_zoneref and spread_dirty_pages cases, but > unaccepted memory seems change nothing for them. > preferred_zoneref is about premature zone exhaustion and spread_dirty_pages is about avoiding premature stalls on a node/zone due to an imbalance in the number of pages waiting for writeback to complete. There is an arguement to be made that they also should accept memory but it's less clear how much of a problem this is. Both are very obvious when they "fail" and likely are covered by the existing watermark checks. Premature pageblock mixing is more subtle as the final impact (root cause of a premature THP allocation failure) is harder to detect. -- Mel Gorman SUSE Labs