[Cc Johannes. The collected vmstat data is in http://lkml.kernel.org/r/1579844599463.32567@xxxxxxxxxxx] On Fri 24-01-20 05:43:19, Chris Edwards wrote: > > Could you collect /proc/vmstat every second or so while you observe this > behavior? This should give us more information that vmstat(8) output. > > Hi Michal, > > Thanks for the suggestion - I've re-run the test on a 5.5.0-rc6 kernel > built from source using the default config, which exhibits the same > behaviour. Please see attachment; I hope the format is OK. I personally would have liked one snapshot per file slightly easier to parse but no problem (I have simply broken out counters per file). In future the following would be easier to process ;) while true do TS="$(date +%s)" cp /proc/vmstat vmstat.$TS sleep 1s done > Here's the timeline of events: > 18:25:00 start > 18:25:10 run `stress` to limit available memory (grabs 0.9 x MemAvailable) I assume this will allocate anonymous memory. time 18:25:10 nr_free_pages 2934822 nr_inactive_anon 57550 nr_active_anon 5733 nr_inactive_file 1428 nr_active_file 21857 nr_unevictable 6102 pswpin 8 pswpout 390136 So there is 11GB of free memory. And 1.5GB of memory swapped out in the past (probably a result of previous tests), we are going to use this number as a base for future comparing because pswpout counter is incremental. Anonymous LRUs have 240MB of memory and there is 90MB of file backed. > 18:25:20 run `dd` to exercise the buffer cache time 18:25:20 nr_free_pages 367818 nr_inactive_anon 57693 nr_active_anon 2560480 nr_inactive_file 7110 nr_active_file 23332 nr_unevictable 6195 pswpin 8 pswpout 390136 The free memory dropped to 1.4GB as a result of your `stress` load. All that memory landed in the anonymous LRU lists (9GB of memory comparing to 240MB before the test). File backed memory's grown to 118MB. No swapout/in durinf that time period. Nothing really unexpected so far. There is still quite some room to fit the IO workload in. Let's see how the pswpout evolves over time. $ awk '{diff=$1-prev; if (prev&&diff) printf "%d %d %d\n", NR, $1, diff; prev=$1}' pswpout 30 392136 2000 31 395513 3377 32 399132 3619 33 403101 3969 34 407211 4110 35 410812 3601 36 414120 3308 37 418119 3999 38 422116 3997 39 424154 2038 40 428110 3956 So the swappout started around 18:25:00 $ sed '1,28d;' nr_free_pages | head 118413 100516 98751 95914 97059 101303 101588 97801 99415 99842 The free memory dropped down to ~400MB which is likely the min_free_kbytes defined watermark $ sed '1,28d;' nr_inactive_anon | head -n3 57633 57828 58932 $ sed '1,28d;' nr_active_anon | head -n3 2560522 2560148 2559087 Anonymous list around 10GB $ sed '1,28d;' nr_inactive_file | head -n3 255957 276400 278865 $ sed '1,28d;' nr_active_file | head -n3 23334 23439 22743 File lists 1.1GB. Inactive file LRU is quite large and $ sed '1,28d;' nr_dirty | head -n3 0 0 0 $ sed '1,28d;' nr_writeback | head -n3 0 0 141 The data shouldn't be dirty so we should preferably reclaim those pages rather than swap out. That is little bit surprising to me. Johannes what do you think about this? > 18:26:00 echo 3 > /proc/sys/vm/drop_caches > 18:26:30 echo 3 > /proc/sys/vm/drop_caches > > The paging stops after each drop_caches, but starts again once the buffer cache utilisation rises. -- Michal Hocko SUSE Labs