Hi experts, I'm using Patroni Postgres installation and noticed that twice already postgres crashed due to out of memory. I'm using logical replication with around 30-40 active subscribers on
this machine. The machine has 128GB but only 32GB is allocated to Postgres. How can I know what is actually causing the out of memory issue? Is it caused by not optimal postgres configuration or something else? /usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf
--port=5432 --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=on --max_replication_slots=90 --max_worker_processes=30
--wal_log_hints=on Oct 27 07:05:31 node2 kernel: postgres invoked oom-killer: gfp_mask=0xd0, order=0,
oom_score_adj=993 Oct 27 07:05:31 node2 kernel: postgres cpuset=docker-6ae67e04710619972d3b1ab5d4c69c318d001c2da47fecee121cdc60279a14a0.scope mems_allowed=0 Oct 27 07:05:31 node2 kernel: CPU: 6 PID: 15536 Comm: postgres Kdump: loaded Tainted: G ------------ T 3.10.0-1160.el7.x86_64 #1 Oct 27 07:05:31 node2 kernel: Hardware name: Kontron MSP8040/4008, BIOS Core: 5.11, MSP804x: 1.57.0943FC77 05/06/2020 Oct 27 07:05:31 node2 kernel: Call Trace: Oct 27 07:05:31 node2 kernel: [<ffffffffa4581340>] dump_stack+0x19/0x1b Oct 27 07:05:31 node2 kernel: [<ffffffffa457bc60>] dump_header+0x90/0x229 Oct 27 07:05:31 node2 kernel: [<ffffffffa409d008>] ? ep_poll_callback+0xf8/0x220 Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc1b26>] ? find_lock_task_mm+0x56/0xc0 Oct 27 07:05:31 node2 kernel: [<ffffffffa403c8c8>] ? try_get_mem_cgroup_from_mm+0x28/0x60 Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc208d>] oom_kill_process+0x2cd/0x490 Oct 27 07:05:31 node2 kernel: [<ffffffffa4040cdc>] mem_cgroup_oom_synchronize+0x55c/0x590 Oct 27 07:05:31 node2 kernel: [<ffffffffa4040140>] ? mem_cgroup_charge_common+0xc0/0xc0 Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc2974>] pagefault_out_of_memory+0x14/0x90 Oct 27 07:05:31 node2 kernel: [<ffffffffa457a16c>] mm_fault_error+0x6a/0x157 Oct 27 07:05:31 node2 kernel: [<ffffffffa458e8d1>] __do_page_fault+0x491/0x500 Oct 27 07:05:31 node2 kernel: [<ffffffffa458e975>] do_page_fault+0x35/0x90 Oct 27 07:05:31 node2 kernel: [<ffffffffa458a778>] page_fault+0x28/0x30 Oct 27 07:05:31 node2 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-6ae67e0471061997$ Oct 27 07:05:31 node2 kernel: memory: usage 32768000kB, limit 32768000kB, failcnt 144867 Oct 27 07:05:31 node2 kernel: memory+swap: usage 32768000kB, limit 9007199254740988kB, failcnt 0 Oct 27 07:05:31 node2 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice: cache:$ Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$ Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$ Oct 27 07:05:31 node2 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Oct 27 07:05:31 node2 kernel: [13159] 0 13159 239 1 3 0 -998 pause Oct 27 07:05:31 node2 kernel: [13322] 0 13322 1095 97 8 0 993 dumb-init Oct 27 07:05:31 node2 kernel: [13335] 0 13335 1156 171 8 0 993 sh Oct 27 07:05:31 node2 kernel: [13411] 0 13411 1137 98 8 0 993 runsvdir Oct 27 07:05:31 node2 kernel: [13438] 0 13438 1099 98 7 0 993 runsv Oct 27 07:05:31 node2 kernel: [13439] 0 13439 1099 98 7 0 993 runsv Oct 27 07:05:31 node2 kernel: [13440] 101 13440 27026 1186 54 0 993 pgqd Oct 27 07:05:31 node2 kernel: [13441] 101 13441 155215 8237 101 0 993 patroni Oct 27 07:05:31 node2 kernel: [19532] 101 19532 1740033 46817 171 0 993 postgres Oct 27 07:05:31 node2 kernel: [19542] 101 19542 1767874 6713 121 0 993 postgres Oct 27 07:05:31 node2 kernel: [19546] 101 19546 1740173 1445031 3166 0 993 postgres Oct 27 07:05:31 node2 kernel: [19547] 101 19547 1740069 20060 171 0 993 postgres Oct 27 07:05:31 node2 kernel: [19548] 101 19548 1740027 4821 86 0 993 postgres Oct 27 07:05:31 node2 kernel: [19549] 101 19549 1740283 1011 91 0 993 postgres Oct 27 07:05:31 node2 kernel: [19549] 101 19549 1740283 1011 91 0 993 postgres Oct 27 07:05:31 node2 kernel: [19550] 101 19550 50320 798 77 0 993 postgres Oct 27 07:05:31 node2 kernel: [19551] 101 19551 1740524 2097 117 0 993 postgres Oct 27 07:05:31 node2 kernel: [19552] 101 19552 1740766 2043 100 0 993 postgres Oct 27 07:05:31 node2 kernel: [19556] 101 19556 1744493 16306 476 0 993 postgres Oct 27 07:05:31 node2 kernel: [19588] 101 19588 1744522 16353 484 0 993 postgres Oct 27 07:05:31 node2 kernel: [19598] 101 19598 1745580 22539 602 0 993 postgres Oct 27 07:05:31 node2 kernel: [19609] 101 19609 5909368 4364063 9947 0 993 postgres Oct 27 07:05:31 node2 kernel: [19617] 101 19617 1745576 16669 494 0 993 postgres Oct 27 07:05:31 node2 kernel: [19676] 101 19676 1740468 2208 113 0 993 postgres Oct 27 07:05:31 node2 kernel: [19744] 101 19744 1741155 19698 615 0 993 postgres Oct 27 07:05:31 node2 kernel: [19776] 101 19776 1742008 105333 1733 0 993 postgres Oct 27 07:05:31 node2 kernel: [19793] 101 19793 1740738 58109 631 0 993 postgres Oct 27 07:05:31 node2 kernel: [19799] 101 19799 1740737 29227 546 0 993 postgres Oct 27 07:05:31 node2 kernel: [19806] 101 19806 1740435 1393 96 0 993 postgres Oct 27 07:05:31 node2 kernel: [19811] 101 19811 1741021 3658 139 0 993 postgres Oct 27 07:05:31 node2 kernel: [19971] 101 19971 1740734 15521 449 0 993 postgres Oct 27 07:05:31 node2 kernel: [20007] 101 20007 1740435 1394 96 0 993 postgres Oct 27 07:05:31 node2 kernel: [20537] 101 20537 1891639 121701 407 0 993 postgres Oct 27 07:05:31 node2 kernel: [20623] 101 20623 1889590 121300 405 0 993 postgres Oct 27 07:05:31 node2 kernel: [20840] 101 20840 1889591 120970 406 0 993 postgres Oct 27 07:05:31 node2 kernel: [20851] 101 20851 1891639 122165 408 0 993 postgres Oct 27 07:05:31 node2 kernel: [20955] 101 20955 1889591 120056 405 0 993 postgres Oct 27 07:05:31 node2 kernel: [21117] 101 21117 1891639 121217 405 0 993 postgres Oct 27 07:05:31 node2 kernel: [21328] 101 21328 1891639 120761 404 0 993 postgres Oct 27 07:05:31 node2 kernel: [21332] 101 21332 1889591 119904 402 0 993 postgres Oct 27 07:05:31 node2 kernel: [21451] 101 21451 1891639 120435 403 0 993 postgres Oct 27 07:05:31 node2 kernel: [21988] 101 21988 1889591 121723 407 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 2456] 101 2456 1889592 120006 406 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 2456] 101 2456 1889592 120006 406 0 993 postgres Oct 27 07:05:31 node2 kernel: [32261] 101 32261 1878518 100278 365 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 3352] 101 3352 1926434 149555 459 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 3893] 101 3893 1926436 149552 459 0 993 postgres Oct 27 07:05:31 node2 kernel: [15217] 101 15217 1880573 101541 370 0 993 postgres Oct 27 07:05:31 node2 kernel: [15246] 101 15246 1879605 101892 368 0 993 postgres Oct 27 07:05:31 node2 kernel: [14774] 101 14774 1773892 26263 219 0 993 postgres Oct 27 07:05:31 node2 kernel: [15904] 101 15904 1773886 26721 220 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 9266] 101 9266 1744448 15139 468 0 993 postgres Oct 27 07:05:31 node2 kernel: [15536] 101 15536 1800641 46553 261 0 993 postgres Oct 27 07:05:31 node2 kernel: [21552] 101 21552 1918253 139792 444 0 993 postgres Oct 27 07:05:31 node2 kernel: [22239] 101 22239 1769356 21754 210 0 993 postgres Oct 27 07:05:31 node2 kernel: [13344] 101 13344 1767312 19434 206 0 993 postgres Oct 27 07:05:31 node2 kernel: [27568] 101 27568 1743509 11079 327 0 993 postgres Oct 27 07:05:31 node2 kernel: [27754] 101 27754 1743483 10910 313 0 993 postgres Oct 27 07:05:31 node2 kernel: [28036] 101 28036 1743550 12152 385 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 7772] 101 7772 1743520 11171 332 0 993 postgres Oct 27 07:05:31 node2 kernel: [20855] 101 20855 1743564 13723 422 0 993 postgres Oct 27 07:05:31 node2 kernel: [30883] 101 30883 1743518 11180 333 0 993 postgres Oct 27 07:05:31 node2 kernel: [31463] 101 31463 1740560 3099 137 0 993 postgres Oct 27 07:05:31 node2 kernel: [31466] 101 31466 1740560 3084 137 0 993 postgres Oct 27 07:05:31 node2 kernel: [31475] 101 31475 1740560 3101 137 0 993 postgres Oct 27 07:05:31 node2 kernel: [31485] 101 31485 1740560 3080 137 0 993 postgres Oct 27 07:05:31 node2 kernel: [31513] 101 31513 1740560 3082 137 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 3321] 101 3321 1753416 13629 164 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 3325] 101 3325 1753416 13756 164 0 993 postgres Oct 27 07:05:31 node2 kernel: [16159] 101 16159 1743331 6323 310 0 993 postgres Oct 27 07:05:31 node2 kernel: [32368] 101 32368 1743330 6156 311 0 993 postgres Oct 27 07:05:31 node2 kernel: [ 1528] 101 1528 2009656 24435 346 0 993 postgres -- more lines... Oct 27 07:05:31 node2 kernel: Memory cgroup out of memory: Kill process 19609 (postgres) score 1526 or sacrifice child Oct 27 07:05:31 node2 kernel: Killed process 19609 (postgres), UID 101, total-vm:23637472kB, anon-rss:16612848kB, file-rss:19156kB, shmem-rss:824248kB Oct 27 07:05:32 node2 etcd: 2023-10-27 07:05:32.607769 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/default/kubernetes\" " with result "$ Oct 27 07:05:34 node2 kubelet: I1027 07:05:34.624970 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$ Oct 27 07:05:38 node2 kubelet: I1027 07:05:38.823153 24409 setters.go:86] Using node IP: "x.x.x.x" Oct 27 07:05:44 node2 kubelet: I1027 07:05:44.011520 24409 container_manager_linux.go:490] [ContainerManager]: Discovered runtime cgroups name: /systemd/system.slice Oct 27 07:05:44 node2 kubelet: I1027 07:05:44.826318 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$ Oct 27 07:05:48 node2 kubelet: I1027 07:05:48.915249 24409 setters.go:86] Using node IP: "x.x.x.x" Oct 27 07:05:55 node2 kubelet: I1027 07:05:55.104931 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$ Oct 27 07:05:58 node2 kubelet: I1027 07:05:58.997936 24409 setters.go:86] Using node IP: "x.x.x.x" Oct 27 07:06:01 node2 systemd: Started Session 332365 of user root. Oct 27 07:06:06 node2 kubelet: I1027 07:06:06.133210 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$ Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029905 24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-scheduler-node2" status=Running Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029966 24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-apiserver-node2" status=Running Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029996 24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-controller-manager-node2" status=Ru$ Oct 27 07:06:08 node2 kubelet: E1027 07:06:08.874227 24409 remote_runtime.go:392] ExecSync 28f30220f15ec2a26071c2175f24f77d1b20870fcb3c24bc659d824223d93deb '/usr/bin$ Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.874326 24409 prober.go:117] Readiness probe for "calico-kube-controllers-7c5b64bf96-q6rjm_kube-system(14519a12-7721-43b$ Oct 27 07:06:09 node2 kubelet: I1027 07:06:09.090193 24409 setters.go:86] Using node IP: "x.x.x.x" Oct 27 07:06:09 node2 etcd: 2023-10-27 07:06:09.223921 I | mvcc: store.index: compact 76535976 Oct 27 07:06:09 node2 etcd: 2023-10-27 07:06:09.253047 I | mvcc: finished scheduled compaction at 76535976 (took 28.190376ms) Oct 27 07:06:15 node2 kubelet: I1027 07:06:15.232655 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$ Oct 27 07:06:19 node2 kubelet: W1027 07:06:19.053762 24409 kubelet_pods.go:880] Unable to retrieve pull secret /registrypullsecret for /-arcmana$ Oct 27 07:06:19 node2 kubelet: I1027 07:06:19.198356 24409 setters.go:86] Using node IP: "x.x.x.x" Oct 27 07:06:25 node2 kubelet: I1027 07:06:25.132514 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$ Oct 27 07:06:29 node2 kubelet: I1027 07:06:29.279555 24409 setters.go:86] Using node IP: "x.x.x.x" Oct 27 07:06:34 node2 kubelet: I1027 07:06:34.596501 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005277 W | etcdserver: read-only range request "key:\"/registry/ingress/\" range_end:\"/registry/ingress0\" count_only:$ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005533 W | etcdserver: request "header:<ID:1981451119914121853 username:\"etcd-node-node1\" auth_revision:1 > txn:<$ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005961 W | etcdserver: read-only range request "key:\"/registry/volumeattachments/\" range_end:\"/registry/volumeattach$ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005996 W | etcdserver: read-only range request "key:\"/registry/crd.projectcalico.org/ipamblocks/\" range_end:\"/regist$ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.006012 W | etcdserver: read-only range request "key:\"/registry/leases/kube-system/kube-scheduler\" " with result "rang$ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.006092 W | etcdserver: read-only range request "key:\"/registry/namespaces/kube-system\" " with result "range_response_$ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.307799 W | etcdserver: read-only range request "key:\"/registry/apiextensions.k8s.io/customresourcedefinitions/\" range$ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.308862 W | etcdserver: read-only range request "key:\"/registry/health\" " with result "range_response_count:0 size:7" $ Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.308906 W | etcdserver: read-only range request "key:\"/registry/namespaces/default\" " with result "r |