On 22.07.2013 19:01, Johannes Weiner wrote:
Hi Zlatko,
On Mon, Jul 22, 2013 at 06:48:52PM +0200, Zlatko Calusic wrote:
On 19.07.2013 22:55, Johannes Weiner wrote:
The way the page allocator interacts with kswapd creates aging
imbalances, where the amount of time a userspace page gets in memory
under reclaim pressure is dependent on which zone, which node the
allocator took the page frame from.
#1 fixes missed kswapd wakeups on NUMA systems, which lead to some
nodes falling behind for a full reclaim cycle relative to the other
nodes in the system
#3 fixes an interaction where kswapd and a continuous stream of page
allocations keep the preferred zone of a task between the high and
low watermark (allocations succeed + kswapd does not go to sleep)
indefinitely, completely underutilizing the lower zones and
thrashing on the preferred zone
These patches are the aging fairness part of the thrash-detection
based file LRU balancing. Andrea recommended to submit them
separately as they are bugfixes in their own right.
I have the patch applied and under testing. So far, so good. It
looks like it could finally fix the bug that I was chasing few
months ago (nicely described in your bullet #3). But, few more days
of testing will be needed before I can reach a quality verdict.
I should have remembered that you talked about this problem... Thanks
a lot for testing!
May I ask for the zone layout of your test machine(s)? I.e. how many
nodes if NUMA, how big Normal and DMA32 (on Node 0) are.
I have been reading about NUMA hw for at least a decade, but I guess
another one will pass before I actually see one. ;) Find /proc/zoneinfo
attached.
If your patchset fails my case, then nr_{in,}active_file in Normal zone
will drop close to zero in a matter of days. If it fixes this particular
imbalance, and I have faith it will, then those two counters will stay
in relative balance with nr_{in,}active_anon in the same zone. I also
applied Konstantin's excellent lru-milestones-timestamps-and-ages, and
graphing of interesting numbers on top of that, which is why I already
have faith in your patchset. I can see much better balance between zones
already. But, let's give it some more time...
--
Zlatko
Node 0, zone DMA
pages free 3975
min 132
low 165
high 198
scanned 0
spanned 4095
present 3998
managed 3977
nr_free_pages 3975
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 2
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
nr_anon_transparent_hugepages 0
nr_free_cma 0
protection: (0, 3236, 3933, 3933)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 4
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 4
all_unreclaimable: 1
start_pfn: 1
inactive_ratio: 1
avg_age_inactive_anon: 0
avg_age_active_anon: 0
avg_age_inactive_file: 0
avg_age_active_file: 0
Node 0, zone DMA32
pages free 83177
min 27693
low 34616
high 41539
scanned 0
spanned 1044480
present 847429
managed 829295
nr_free_pages 83177
nr_inactive_anon 2061
nr_active_anon 313380
nr_inactive_file 199460
nr_active_file 207097
nr_unevictable 0
nr_mlock 0
nr_anon_pages 239688
nr_mapped 38888
nr_file_pages 424978
nr_dirty 87
nr_writeback 0
nr_slab_reclaimable 9119
nr_slab_unreclaimable 2054
nr_page_table_pages 1795
nr_kernel_stack 144
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 18421
nr_dirtied 725414
nr_written 768505
nr_anon_transparent_hugepages 112
nr_free_cma 0
protection: (0, 0, 697, 697)
pagesets
cpu: 0
count: 132
high: 186
batch: 31
vm stats threshold: 24
cpu: 1
count: 146
high: 186
batch: 31
vm stats threshold: 24
all_unreclaimable: 0
start_pfn: 4096
inactive_ratio: 5
avg_age_inactive_anon: 5467648
avg_age_active_anon: 5467648
avg_age_inactive_file: 3184128
avg_age_active_file: 5467648
Node 0, zone Normal
pages free 17164
min 5965
low 7456
high 8947
scanned 0
spanned 196607
present 196607
managed 178491
nr_free_pages 17164
nr_inactive_anon 294
nr_active_anon 64754
nr_inactive_file 42191
nr_active_file 44925
nr_unevictable 0
nr_mlock 0
nr_anon_pages 51456
nr_mapped 9580
nr_file_pages 91492
nr_dirty 27
nr_writeback 0
nr_slab_reclaimable 2686
nr_slab_unreclaimable 1194
nr_page_table_pages 401
nr_kernel_stack 65
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 4376
nr_dirtied 163250
nr_written 172369
nr_anon_transparent_hugepages 18
nr_free_cma 0
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 177
high: 186
batch: 31
vm stats threshold: 16
cpu: 1
count: 170
high: 186
batch: 31
vm stats threshold: 16
all_unreclaimable: 0
start_pfn: 1048576
inactive_ratio: 1
avg_age_inactive_anon: 5468672
avg_age_active_anon: 5468672
avg_age_inactive_file: 3382628
avg_age_active_file: 5468672