The issue has been observed on 4GB RAM x86_64 machines (one server, one desktop) without swap subsystem (not even compiled in). The important thing to remember about a 4GB x86_64 machine is that the NORMAL zone is about 6 times smaller than the DMA32 zone.
As picture is 10000 words, I've attached two graphs that nicely show what I've observed. As memory usage slowly rises, the MM subsystem gradually evicts pagecache pages from the NORMAL zone, trying to eventually get rid of all of them! This process takes days, typically more than 5 on this particular server. Of course, this means that eventually the zone will be choke full of anon pages, and without swap, the kernel can't do much about it. But as it tries to balance the zone, various bad things will happen. On the server I've seen sudden freeing of hundreds of MB of pagecache, on the desktop there's a general slowdown, sound dropouts (HTTP streaming) and so...
The first graph was probably 3.8 kernel, the second one is 3.9.0-rc4+ patched with the kswapd series v2. Obviously not much has changes wrt this problem, although it seems to me that kernel now hesitates freeing a large amounts of memory needlessly, or does it less often. But on the desktop there's no improvement, as soon as the pagecache gets really low in the NORMAL zone, there's severe slowdown, dropouts, etc... One other thing, the lower graphs say "Normal zone file pages", what is actually graphed is nr_active_file + nr_inactive_file from the NORMAL zone!
I've also attached two zoneinfo outputs. Notice how DMA32 zones have hundreds of thousand of pagecache pages, but only a few dozens are in the NORMAL zone! Also nr_vmscan_write is telling. Much higher values for zone NORMAL (especially when you take in account how little pagecache is there!), I guess those poor pagecacache pages that survives there get written a millisecond after they're dirtied, a probable cause of the slowdown I experience on the desktop.
There's a reasonable possibility that this imbalance between zones was introduced somewhere between 3.3 and 3.4, because VM behaves slightly differently in 3.3 (doesn't evict pagecache from the NORMAL zone so aggresively). Unfortunately, I have some userspace incompatibilities when running 3.3, so I'm not 100% sure (didn't run it long enough to be absolutely sure). I tried to find the problematic commit, and cc715d99e529 certainly looked like it's the culprit, but it's not! buffer_heads_over_limit is NEVER true on the machine, not even close. So that commit is basically a noop. Also it's not important if THP is on or off, the behaviour stays the same.
My apologies for the long email, I tried to provide as much information as possible.
-- Zlatko
Attachment:
server-3.8.png
Description: PNG image
Attachment:
server-kswapd-v2.png
Description: PNG image
Node 0, zone DMA pages free 3974 min 128 low 160 high 192 scanned 0 spanned 4080 present 3912 managed 3976 nr_free_pages 3974 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 2 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 0 nr_written 0 nr_anon_transparent_hugepages 0 nr_free_cma 0 protection: (0, 3259, 4015, 4015) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 6 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 6 cpu: 2 count: 0 high: 0 batch: 1 vm stats threshold: 6 cpu: 3 count: 0 high: 0 batch: 1 vm stats threshold: 6 all_unreclaimable: 1 start_pfn: 16 inactive_ratio: 1 Node 0, zone DMA32 pages free 135587 min 27326 low 34157 high 40989 scanned 0 spanned 1044480 present 834513 managed 828967 nr_free_pages 135587 nr_inactive_anon 8165 nr_active_anon 264237 nr_inactive_file 190424 nr_active_file 198798 nr_unevictable 1 nr_mlock 1 nr_anon_pages 219052 nr_mapped 33586 nr_file_pages 397576 nr_dirty 82 nr_writeback 0 nr_slab_reclaimable 21757 nr_slab_unreclaimable 3505 nr_page_table_pages 3293 nr_kernel_stack 134 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 8354 nr_dirtied 5734689 nr_written 5557592 nr_anon_transparent_hugepages 88 nr_free_cma 0 protection: (0, 0, 756, 756) pagesets cpu: 0 count: 181 high: 186 batch: 31 vm stats threshold: 36 cpu: 1 count: 103 high: 186 batch: 31 vm stats threshold: 36 cpu: 2 count: 154 high: 186 batch: 31 vm stats threshold: 36 cpu: 3 count: 149 high: 186 batch: 31 vm stats threshold: 36 all_unreclaimable: 0 start_pfn: 4096 inactive_ratio: 5 Node 0, zone Normal pages free 7954 min 6337 low 7921 high 9505 scanned 0 spanned 196608 present 193536 managed 178447 nr_free_pages 7954 nr_inactive_anon 1916 nr_active_anon 136297 nr_inactive_file 32 nr_active_file 0 nr_unevictable 7767 nr_mlock 7767 nr_anon_pages 118628 nr_mapped 3090 nr_file_pages 3784 nr_dirty 4 nr_writeback 0 nr_slab_reclaimable 5476 nr_slab_unreclaimable 5581 nr_page_table_pages 2785 nr_kernel_stack 254 nr_unstable 0 nr_bounce 0 nr_vmscan_write 2693969 nr_vmscan_immediate_reclaim 10529 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 2348 nr_dirtied 1912471 nr_written 1784816 nr_anon_transparent_hugepages 46 nr_free_cma 0 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 151 high: 186 batch: 31 vm stats threshold: 24 cpu: 1 count: 171 high: 186 batch: 31 vm stats threshold: 24 cpu: 2 count: 143 high: 186 batch: 31 vm stats threshold: 24 cpu: 3 count: 54 high: 186 batch: 31 vm stats threshold: 24 all_unreclaimable: 0 start_pfn: 1048576 inactive_ratio: 1
Node 0, zone DMA pages free 3975 min 132 low 165 high 198 scanned 0 spanned 4080 present 3983 managed 3977 nr_free_pages 3975 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 2 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 0 nr_written 0 nr_anon_transparent_hugepages 0 nr_free_cma 0 protection: (0, 3236, 3934, 3934) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 4 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 4 all_unreclaimable: 1 start_pfn: 16 inactive_ratio: 1 Node 0, zone DMA32 pages free 198806 min 27693 low 34616 high 41539 scanned 0 spanned 1044480 present 847429 managed 828646 nr_free_pages 198806 nr_inactive_anon 152 nr_active_anon 296082 nr_inactive_file 159143 nr_active_file 148277 nr_unevictable 0 nr_mlock 0 nr_anon_pages 212100 nr_mapped 30139 nr_file_pages 325028 nr_dirty 61 nr_writeback 0 nr_slab_reclaimable 23373 nr_slab_unreclaimable 1418 nr_page_table_pages 1044 nr_kernel_stack 55 nr_unstable 0 nr_bounce 0 nr_vmscan_write 203475 nr_vmscan_immediate_reclaim 1159794 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 17608 nr_dirtied 120403187 nr_written 119379429 nr_anon_transparent_hugepages 130 nr_free_cma 0 protection: (0, 0, 697, 697) pagesets cpu: 0 count: 121 high: 186 batch: 31 vm stats threshold: 24 cpu: 1 count: 107 high: 186 batch: 31 vm stats threshold: 24 all_unreclaimable: 0 start_pfn: 4096 inactive_ratio: 5 Node 0, zone Normal pages free 7449 min 5965 low 7456 high 8947 scanned 0 spanned 196607 present 196607 managed 178497 nr_free_pages 7449 nr_inactive_anon 280 nr_active_anon 149997 nr_inactive_file 121 nr_active_file 33 nr_unevictable 0 nr_mlock 0 nr_anon_pages 138419 nr_mapped 2050 nr_file_pages 2796 nr_dirty 4 nr_writeback 0 nr_slab_reclaimable 2388 nr_slab_unreclaimable 2284 nr_page_table_pages 1203 nr_kernel_stack 156 nr_unstable 0 nr_bounce 0 nr_vmscan_write 12486086 nr_vmscan_immediate_reclaim 1290613 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 2642 nr_dirtied 16946001 nr_written 16543553 nr_anon_transparent_hugepages 18 nr_free_cma 0 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 93 high: 186 batch: 31 vm stats threshold: 16 cpu: 1 count: 114 high: 186 batch: 31 vm stats threshold: 16 all_unreclaimable: 0 start_pfn: 1048576 inactive_ratio: 1