On Wed, Jul 17, 2024 at 09:19:12AM +0200, Michal Hocko wrote: > On Tue 16-07-24 16:00:13, Kirill A. Shutemov wrote: > > Unaccepted memory is considered unusable free memory, which is not > > counted as free on the zone watermark check. This causes > > get_page_from_freelist() to accept more memory to hit the high > > watermark, but it creates problems in the reclaim path. > > > > The reclaim path encounters a failed zone watermark check and attempts > > to reclaim memory. This is usually successful, but if there is little or > > no reclaimable memory, it can result in endless reclaim with little to > > no progress. This can occur early in the boot process, just after start > > of the init process when the only reclaimable memory is the page cache > > of the init executable and its libraries. > > How does this happen when try_to_accept_memory is the first thing to do > when wmark check fails in the allocation path? Good question. I've lost access to the test setup and cannot check it directly right now. Reading the code Looks like __alloc_pages_bulk() bypasses get_page_from_freelist() where we usually accept more pages and goes directly to __rmqueue_pcplist() -> rmqueue_bulk() -> __rmqueue(). Will look more into it when I have access to the test setup. > Could you describe what was the initial configuration of the system? How > much of the unaccepted memory was there to trigger this? This is large TDX guest VM: 176 vCPUs and ~800GiB of memory. One thing that I noticed that the problem is only triggered when LRU_GEN enabled. But I failed to identify why. The system hang (or have very little progress) shortly after systemd starts. > > To address this issue, teach shrink_node() and shrink_zones() to accept > > memory before attempting to reclaim. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > Reported-by: Jianxiong Gao <jxgao@xxxxxxxxxx> > > Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") > > Cc: stable@xxxxxxxxxxxxxxx # v6.5+ > [...] > > static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) > > { > > unsigned long nr_reclaimed, nr_scanned, nr_node_reclaimed; > > struct lruvec *target_lruvec; > > bool reclaimable = false; > > > > + /* Try to accept memory before going for reclaim */ > > + if (node_try_to_accept_memory(pgdat, sc)) { > > + if (!should_continue_reclaim(pgdat, 0, sc)) > > + return; > > + } > > + > > This would need an exemption from the memcg reclaim. Hm. Could you elaborate why? -- Kiryl Shutsemau / Kirill A. Shutemov