On 6/6/24 3:32 PM, Erhard Furtner wrote: > On Thu, 6 Jun 2024 09:24:56 +0200 > "Vlastimil Babka (SUSE)" <vbabka@xxxxxxxxxx> wrote: > >> Besides the zpool commit which might have just pushed the machine over the >> edge, but it was probably close to it already. I've noticed a more general >> problem that there are GFP_KERNEL allocations failing from kswapd. Those >> could probably use be __GFP_NOMEMALLOC (or scoped variant, is there one?) >> since it's the case of "allocating memory to free memory". Or use mempools >> if the progress (success will lead to freeing memory) is really guaranteed. >> >> Another interesting data point could be to see if traditional reclaim >> behaves any better on this machine than MGLRU. I saw in the config: >> >> CONFIG_LRU_GEN=y >> CONFIG_LRU_GEN_ENABLED=y >> >> So disabling at least the second one would revert to the traditional reclaim >> and we could see if it handles such a constrained system better or not. > > I set RANDOM_KMALLOC_CACHES=n and LRU_GEN_ENABLED=n but still hit the issue. > > dmesg looks a bit different (unpatched v6.10-rc2). What caught my eye, but it's also in some of the previous dmesg with MGRLU, is that in one case there's: DMA free:0kB That means many allocations went through that are allowed to just ignore all reserves, and depleted everything. That would mean __GFP_MEMALLOC or PF_MEMALLOC, which I suggested earlier for the GFP_KERNEL failure, is being used somewhere, but not leading to the expected memory freeing. > Regards, > Erhard