Linux choosing to swap despite having 250G of file memory available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I got a question about the behavior of linux which I do not understand currently. This is the situation:

The server has 1T of memory, of which 700G of memory is allocated to hugepages (hp size 1G).
This leaves 300G of memory in smallpages, for which I assume linux will apply it’s general memory behaviour. 

From the smallpages memory, I see > 250G being classified as file memory, and roughly only about 15G allocated to anon (anonymous memory). 

The load on the server is caused by a postgres database instance with on average 80 sessions active, of which a varying number is performing read and write IO. Postgres performs buffered reads and writes using the pread64() and prwrite64() calls, and always performs IO using an IO size of 8KB. 

However, it should be noted that postgres can also use posix_fadvise() to make the OS preread blocks using POSIX_FADV_WILLNEED. 
There might be independent asynchronous IO via direct path, but I have not been informed on how that exactly works. That IO might be on the postgres files the regular pread64 and pwrite64 are executing, but these calls are not part of open source postgres.

The amount of IO that is taking place is also noteworthy: using the iotop utility I can both total and actual disk reads and writes going up to 3 GBPS for reads and up 500 MBPS for writes.

The question I have is why linux chooses to swap, despite having lots of file memory, for which it reports (via MemAvailable) that it’s available.
I need more tools on this machine, but I do not have the impression it’s extremely influencing sessions, although top (with the swap field added) shows that every postgres database process has swapped out memory.
It also does not seem healthy to have ongoing swapping in and out continuously going on.

Thank you.

Frits Hoogland


The filesystem is XFS, mount options noatime, inode64, nodiratime, nodev.

Operating system: Red Hat Enterprise Linux 8.9
Kernel: 4.18.0-513.5.1.el8_9.x86_64 #1 SMP

/proc/meminfo
MemTotal:       1055737556 kB
MemFree:         2213440 kB
MemAvailable:   298459084 kB
Buffers:            1340 kB
Cached:         286127736 kB
SwapCached:       837032 kB
Active:         29872372 kB
Inactive:       269357644 kB
Active(anon):    3788932 kB
Inactive(anon):  8167304 kB
Active(file):   26083440 kB
Inactive(file): 261190340 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      16777212 kB
SwapFree:        8386212 kB
Dirty:           5311136 kB
Writeback:             0 kB
AnonPages:      12670468 kB
Mapped:           138672 kB
Shmem:             64104 kB
KReclaimable:   11589948 kB
Slab:           13762220 kB
SReclaimable:   11589948 kB
SUnreclaim:      2172272 kB
KernelStack:       38240 kB
PageTables:       266476 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    324791060 kB
Committed_AS:   37860836 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      481952 kB
VmallocChunk:          0 kB
Percpu:            97792 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:     704
HugePages_Free:      366
HugePages_Rsvd:        2
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        738197504 kB
DirectMap4k:    17853212 kB
DirectMap2M:    302462976 kB
DirectMap1G:    752877568 kB

vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
42  6 8424412 1743776   1340 299215616    8   11  6786   911    0    0 13  3 82  2  0
29 12 8428456 2678092   1340 298592000 1220 6092 2149152 249176 466196 366703 27  6 62  5  0
32  8 8427984 1701916   1340 299545600 1788 1832 2020852 316652 366820 309808 23  5 67  5  0
41  4 8430328 1702864   1340 299411136  960 4136 2228240 263160 433730 368056 24  6 64  6  0
49  4 8402344 1724772   1340 299495040 1392 6320 2435296 303464 463479 368963 23  7 64  6  0
33  5 8401056 1757348   1340 299520960 1788 4088 2107472 248576 395061 350817 23  9 63  5  0
30 11 8403788 1721484   1340 299539776  560 4012 2237708 229508 426055 384332 25  6 65  5  0
36 10 8409792 1904800   1340 299274848  364 6772 2192364 294444 428661 390878 25  6 64  6  0
33  8 8415368 1804560   1340 299320800  616 7112 2195100 272072 447890 398957 26  6 61  6  0
39  3 8386444 1885732   1340 299333088 2032 7172 2163180 266672 459675 419805 26  7 61  7  0

swapon -s
Filename Type Size Used Priority
/dev/dm-1                               partition 16777212 8405472 -2

sysctl -a | grep ^vm
vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.compact_unevictable_allowed = 1
vm.compaction_proactiveness = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
vm.drop_caches = 0
vm.extfrag_threshold = 500
vm.force_cgroup_v2_swappiness = 0
vm.hugetlb_shm_group = 32022
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256 256 32 0 0
vm.max_map_count = 500000
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 71274
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 4096
vm.mmap_rnd_bits = 28
vm.mmap_rnd_compat_bits = 8
vm.nr_hugepages = 704
vm.nr_hugepages_mempolicy = 704
vm.nr_overcommit_hugepages = 0
vm.numa_stat = 1
vm.numa_zonelist_order = Node
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 97
vm.page-cluster = 3
vm.page_lock_unfairness = 5
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.stat_interval = 1
vm.swappiness = 1
vm.user_reserve_kbytes = 131072
vm.vfs_cache_pressure = 100
vm.watermark_boost_factor = 15000
vm.watermark_scale_factor = 10
vm.zone_reclaim_mode = 0






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux