[Cc Vlastimil - logs are http://lkml.kernel.org/r/1d9ee19a-98c9-cd78-1e5b-21d9d6e36792@xxxxxxxxxxxx] On Mon 09-09-19 10:54:21, Stefan Priebe - Profihost AG wrote: > Hello Michal, > > Am 09.09.19 um 10:27 schrieb Michal Hocko: > > On Fri 06-09-19 12:08:31, Stefan Priebe - Profihost AG wrote: > >> These are the biggest differences in meminfo before and after cached > >> starts to drop. I didn't expect cached end up in MemFree. > >> > >> Before: > >> MemTotal: 16423116 kB > >> MemFree: 374572 kB > >> MemAvailable: 5633816 kB > >> Cached: 5550972 kB > >> Inactive: 4696580 kB > >> Inactive(file): 3624776 kB > >> > >> > >> After: > >> MemTotal: 16423116 kB > >> MemFree: 3477168 kB > >> MemAvailable: 6066916 kB > >> Cached: 2724504 kB > >> Inactive: 1854740 kB > >> Inactive(file): 950680 kB > >> > >> Any explanation? > > > > Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and > > me earlier in this thread? Seeing the overall progress would tell us > > much more than before and after. Or have I missed this data? > > I needed to wait until today to grab again such a situation but from > what i know it is very clear that MemFree is low and than the kernel > starts to drop the chaches. > > Attached you'll find two log files. $ grep pgsteal_kswapd vmstat | uniq -c 1331 pgsteal_kswapd 37142300 $ grep pgscan_kswapd vmstat | uniq -c 1331 pgscan_kswapd 37285092 kswapd hasn't scanned nor reclaimed any memory throughout the whole collected time span. On the other hand we can see direct reclaim active. But we can see quite some direct reclaim activity: $ awk '/pgsteal_direct/ {val=$2+0; ln++; if (last && val-last > 0) {printf("%d %d\n", ln, val-last)} last=val}' vmstat | head 17 1058 18 9773 19 1036 24 11413 49 1055 50 1050 51 17938 52 22665 53 29400 54 5997 So there is a steady source of the direct reclaim which is quite unexpected considering the background reclaim is inactive. Or maybe it is blocked not able to make a forward progress. 780513 pages has been reclaimed which is 3G worth of memory which matches the dropdown you are seeing AFAICS. $ grep allocstall_dma32 vmstat | uniq -c 1331 allocstall_dma32 0 $ grep allocstall_normal vmstat | uniq -c 1331 allocstall_normal 39 no direct reclaim invoked for DMA32 and Normal zones. But Movable zone seems the be the source of the direct reclaim awk '/allocstall_movable/ {val=$2+0; ln++; if (last && val-last > 0) {printf("%d %d\n", ln, val-last)} last=val}' vmstat | head 17 1 18 9 19 1 24 10 49 1 50 1 51 17 52 20 53 28 54 5 and that matches moments when we reclaimed memory. There seems to be a steady THP allocations flow so maybe this is a source of the direct reclaim? -- Michal Hocko SUSE Labs