Re: Caching/buffers become useless after some time

Michal Hocko <mhocko@xxxxxxxx> · Mon, 30 Jul 2018 16:40:48 +0200



On Fri 27-07-18 13:15:33, Vlastimil Babka wrote:
> On 07/21/2018 12:03 AM, Marinko Catovic wrote:
> > I let this run for 3 days now, so it is quite a lot, there you go:
> > https://nofile.io/f/egGyRjf0NPs/vmstat.tar.gz
> 
> The stats show that compaction has very bad results. Between first and
> last snapshot, compact_fail grew by 80k and compact_success by 1300.
> High-order allocations will thus cycle between (failing) compaction and
> reclaim that removes the buffer/caches from memory.

I guess you are right. I've just looked at random large direct reclaim activity
$ grep -w pgscan_direct  vmstat*| awk  '{diff=$2-old; if (old && diff > 100000) printf "%s %d\n", $1, diff; old=$2}'
vmstat.1531957422:pgscan_direct 114334
vmstat.1532047588:pgscan_direct 111796

$ paste-with-diff.sh vmstat.1532047578 vmstat.1532047588 | grep "pgscan\|pgsteal\|compact\|pgalloc" | sort
# counter			value1		value2-value1
compact_daemon_free_scanned     2628160139      0
compact_daemon_migrate_scanned  797948703       0
compact_daemon_wake     23634   0
compact_fail    124806  108
compact_free_scanned    226181616304    295560271
compact_isolated        2881602028      480577
compact_migrate_scanned 147900786550    27834455
compact_stall   146749  108
compact_success 21943   0
pgalloc_dma     0       0
pgalloc_dma32   1577060946      10752
pgalloc_movable 0       0
pgalloc_normal  29389246430     343249
pgscan_direct   737335028       111796
pgscan_direct_throttle  0       0
pgscan_kswapd   1177909394      0
pgsteal_direct  704542843       111784
pgsteal_kswapd  898170720       0

There is zero kswapd activity so this must have been higher order
allocation activity and all the direct compaction failed so we keep
reclaiming.
-- 
Michal Hocko
SUSE Labs