Hello,
I have a setup in which I have constantly 8 connections to the database
open which query data for map rendering. The query involve only a hand
full of tables, but could be complex and use postgis functions.
The connection stays open. Only read-requests are happening, no
update/insert/delete.
I observe available memory going down at a rate of roughly 2GB per hour.
I am running the database inside a docker container, using latest
postgis/postgis image which is then using the latest postgres image as a
base.
PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
POSTGIS=""3.1.1 aaf4c79"" [EXTENSION] PGSQL=""130""
GEOS=""3.7.1-CAPI-1.11.1 27a5e771"" PROJ=""Rel. 5.2.0, September 15th,
2018"" LIBXML=""2.9.4"" LIBJSON=""0.12.1"" LIBPROTOBUF=""1.3.1""
WAGYU=""0.5.0 (Internal)""
It keeps increasing RssAnon memory of the backends until OOM.
The wiki suggested to dump MemoryContext states for more details, but
something strange happens when attaching gdb. It seems that the process
is immediately killed and I can no longer dump such details.
Is this related to running within docker? I added capabilities and
seccomp for gdb:
cap_add:
- SYS_PTRACE
security_opt:
- seccomp:unconfined
with count(*) in pg_stat_activity I see 20 connections. Besides the 8
expected client backend it also lists some parallel worker and other
postgres maintenance tasks.
work_mem is set to 50MB. shared_buffers to 3500MB. By looking at
/proc/meminfo I think that all of this shared memory is statically
allocated as HugePages. I see no increase there. Only increase is within
RssAnon.
Any ideas on how to debug deeper into this issue?
Any idea why gdb is not working as expected? More logs below.
It seems that the binary is terminating while gdb is attaching also
sending a cont immediately does not change it.
Any ideas what configuration parameter might help? I had a similar setup
working with older versions of postgresql and postgis and it did not
show this constant loss of memory.
I can work-around by periodically sending a SIGHUP to my clients, which
will then reconnect to postgresql and this frees the memory of the
backends. So is something kept longer in memory than needed?
Others seem to have similar issues using slightly older versions as well
https://github.com/openstreetmap/mod_tile/issues/181#issuecomment-738303445
I would appreciate some ideas/pointers on how to track this down further.
Logs of OOM (before using hugepages, but this did not change the situation):
postgres invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE),
order=0, oom_score_adj=0
CPU: 9 PID: 3322620 Comm: postgres Not tainted 5.4.0-67-generic #75-Ubuntu
Hardware name: Gigabyte Technology Co., Ltd. B360 HD3P-LM/B360HD3PLM-CF,
BIOS F7 HZ 07/24/2020
Call Trace:
dump_stack+0x6d/0x8b
dump_header+0x4f/0x1eb
oom_kill_process.cold+0xb/0x10
out_of_memory.part.0+0x1df/0x3d0
out_of_memory+0x6d/0xd0
__alloc_pages_slowpath+0xd5e/0xe50
__alloc_pages_nodemask+0x2d0/0x320
alloc_pages_current+0x87/0xe0
__page_cache_alloc+0x72/0x90
pagecache_get_page+0xbf/0x300
filemap_fault+0x6b2/0xa50
? unlock_page_memcg+0x12/0x20
? page_add_file_rmap+0xff/0x1a0
? xas_load+0xd/0x80
? xas_find+0x17f/0x1c0
? filemap_map_pages+0x24c/0x380
ext4_filemap_fault+0x32/0x50
__do_fault+0x3c/0x130
do_fault+0x24b/0x640
? __switch_to_asm+0x34/0x70
__handle_mm_fault+0x4c5/0x7a0
handle_mm_fault+0xca/0x200
do_user_addr_fault+0x1f9/0x450
__do_page_fault+0x58/0x90
do_page_fault+0x2c/0xe0
page_fault+0x34/0x40
RIP: 0033:0x558ddf0beded
Code: Bad RIP value.
RSP: 002b:00007ffe5214a020 EFLAGS: 00010202
RAX: 00007fea26b16b28 RBX: 0000000000000028 RCX: 00007fea26b16b68
RDX: 0000000000000028 RSI: 0000000000000000 RDI: 00007fea26b16b28
RBP: 0000000000000010 R08: 00007fea26b16b28 R09: 0000000000000019
R10: 0000000000000001 R11: 0000000000000001 R12: 00000000ffffffff
R13: 00007fea26af76d8 R14: 00007fea26af7728 R15: 0000000000000000
Mem-Info:
active_anon:29797121 inactive_anon:2721653 isolated_anon:32
active_file:323 inactive_file:83 isolated_file:0
unevictable:16 dirty:14 writeback:0 unstable:0
slab_reclaimable:85925 slab_unreclaimable:106003
mapped:1108591 shmem:14943591 pagetables:69567 bounce:0
free:148637 free_pcp:1619 free_cma:0
Node 0 active_anon:119188484kB inactive_anon:10886612kB
active_file:1292kB inactive_file:332kB unevictable:64kB
isolated(anon):128kB isolated(file):0kB mapped:4434364kB dirty:56kB
writeback:0kB shmem:59774364kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Node 0 DMA free:15904kB min:8kB low:20kB high:32kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB present:15988kB managed:15904kB mlocked:0kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
lowmem_reserve[]: 0 809 128670 128670 128670
Node 0 DMA32 free:511716kB min:424kB low:1252kB high:2080kB
active_anon:328544kB inactive_anon:8540kB active_file:204kB
inactive_file:0kB unevictable:0kB writepending:0kB present:947448kB
managed:881912kB mlocked:0kB kernel_stack:0kB pagetables:416kB
bounce:0kB free_pcp:1412kB local_pcp:288kB free_cma:0kB
lowmem_reserve[]: 0 0 127860 127860 127860
Node 0 Normal free:66928kB min:67148kB low:198076kB high:329004kB
active_anon:118859812kB inactive_anon:10877988kB active_file:1812kB
inactive_file:1480kB unevictable:64kB writepending:56kB
present:133160960kB managed:130937320kB mlocked:64kB
kernel_stack:18336kB pagetables:277852kB bounce:0kB free_pcp:5064kB
local_pcp:384kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB
(U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
Node 0 DMA32: 1388*4kB (UME) 1274*8kB (UME) 1772*16kB (UE) 1564*32kB
(UME) 1082*64kB (UE) 514*128kB (UME) 160*256kB (UE) 46*512kB (UME)
23*1024kB (UE) 11*2048kB (UME) 42*4096kB (UME) = 511808kB
Node 0 Normal: 715*4kB (UEH) 108*8kB (UEH) 1400*16kB (UMEH) 1156*32kB
(UMEH) 58*64kB (UMEH) 7*128kB (UMEH) 1*256kB (U) 0*512kB 0*1024kB
0*2048kB 0*4096kB = 67980kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
14982418 total pagecache pages
38157 pages in swap cache
Swap cache stats: add 14086008, delete 14047525, find 102734363/105748939
Free swap = 0kB
Total swap = 4189180kB
33531099 pages RAM
0 pages HighMem/MovableOnly
572315 pages reserved
0 pages cma reserved
0 pages hwpoisoned
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=fae7f79d710f4449fd87c58f38eb164a470e3f837b33630c6c10a9fbca10a82b,mems_allowed=0,global_oom,task_memcg=/docker/e88b65d20ef39588c6bf9c00e7aa2946f134a61a6195c210f7081d7ed4d9a5fa,task=postgres,pid=2453393,uid=999
Out of memory: Killed process 2453393 (postgres) total-vm:10625568kB,
anon-rss:6768188kB, file-rss:4kB, shmem-rss:3592772kB, UID:999
pgtables:20756kB oom_score_adj:0
oom_reaper: reaped process 2453393 (postgres), now anon-rss:0kB,
file-rss:0kB, shmem-rss:3592772kB
Logs of gdb attach
# gdb -p 1296 -ex cont
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 1296
Reading symbols from /usr/lib/postgresql/13/bin/postgres...Reading
symbols from
/usr/lib/debug/.build-id/31/ae2853776500091d313e76cf679017e697884b.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading
symbols from
/usr/lib/debug/.build-id/e9/1114987a0147bd050addbd591eb8994b29f4b3.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /usr/lib/x86_64-linux-gnu/libxml2.so.2...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libpam.so.0...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libssl.so.1.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols
from
/usr/lib/debug/.build-id/5d/cf98ad684962be494af28a1051793fd39e4ebc.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols
from
/usr/lib/debug/.build-id/d3/583c742dd47aaa860c5ae0c0c5bdbcd2d54f61.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols
from
/usr/lib/debug/.build-id/88/5dda4b4a5cea600e7b5b98c1ad86996c8d2299.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicui18n.so.63...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicuuc.so.63...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libsystemd.so.0...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols
from
/usr/lib/debug/.build-id/18/b9a9a8c523e5cfe5b5d946d605d09242f09798.debug...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from
/usr/lib/debug/.build-id/f2/5dfd7b95be4ba386fd71080accae8c0732b711.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicudata.so.63...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/liblzma.so.5...(no debugging
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libaudit.so.1...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5.so.3...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libk5crypto.so.3...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libcom_err.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5support.so.0...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libkeyutils.so.1...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libresolv.so.2...Reading
symbols from
/usr/lib/debug/.build-id/02/6c3ba167f64f631eb8781fca2269fbc2ee7ca5.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libsasl2.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgnutls.so.30...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/liblz4.so.1...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgcrypt.so.20...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libcap-ng.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libp11-kit.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libidn2.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libunistring.so.2...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libtasn1.so.6...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libnettle.so.6...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libhogweed.so.4...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgmp.so.10...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgpg-error.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libffi.so.6...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading
symbols from
/usr/lib/debug/.build-id/4b/ff8b782e1602c596e856bdef06e642e50e7fa7.debug...done.
done.
Reading symbols from /usr/lib/postgresql/13/lib/postgis-3.so...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgeos_c.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libproj.so.13...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libjson-c.so.3...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libprotobuf-c.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libSFCGAL.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgeos-3.7.1.so...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libCGAL_Core.so.13...(no
debugging symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.67.0...(no
debugging symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_timer.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.67.0...(no
debugging symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_thread.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_system.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.67.0...(no
debugging symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_date_time.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from
/usr/lib/x86_64-linux-gnu/libboost_atomic.so.1.67.0...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libCGAL.so.13...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgmpxx.so.4...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libmpfr.so.6...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/postgresql/13/lib/hstore.so...Reading
symbols from
/usr/lib/debug/.build-id/ae/d0b2a75765cea927097e4d89b4e2f39a5534be.debug...done.
done.
Reading symbols from /usr/lib/postgresql/13/lib/llvmjit.so...Reading
symbols from
/usr/lib/debug/.build-id/d3/f52dce996ffe1929204774abcd4fa75fbb317e.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libLLVM-7.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libedit.so.2...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libtinfo.so.6...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libbsd.so.0...(no
debugging symbols found)...done.
0x00007fe4bb39a7b7 in epoll_wait (epfd=4, events=0x55d1517d4990,
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 ../sysdeps/unix/sysv/linux/epoll_wait.c:
No such file or directory.
Continuing.
Program received signal SIGQUIT, Quit.
0x00007fe4bb39a7b7 in epoll_wait (epfd=4, events=0x55d1517d4990,
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 in ../sysdeps/unix/sysv/linux/epoll_wait.c
(gdb) p MemoryContextStats(TopMemoryContext)
Couldn't get extended state status: No such process.
(gdb) bt
#0 0x00007fe4bb39a7b7 in epoll_wait (epfd=4, events=0x55d1517d4990,
maxevents=1, maxevents@entry=<error reading variable: Cannot access
memory at address 0x7fff7515518c>, timeout=-1, timeout@entry=<error
reading variable: Cannot access memory at address 0x7fff75155178>)
at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 0x000055d14fe9d089 in WaitEventSetWaitBlock (nevents=<error reading
variable: Cannot access memory at address 0x7fff7515518c>,
occurred_events=<error reading variable: Cannot access memory at address
0x7fff75155168>,
cur_timeout=<error reading variable: Cannot access memory at
address 0x7fff75155178>, set=0x55d1517d4918) at
./build/../src/backend/storage/ipc/latch.c:1295
#2 WaitEventSetWait (set=0x55d1517d4918, timeout=<error reading
variable: Cannot access memory at address 0x7fff75155170>,
occurred_events=<error reading variable: Cannot access memory at address
0x7fff75155168>,
nevents=<error reading variable: Cannot access memory at address
0x7fff7515518c>, wait_event_info=<optimized out>) at
./build/../src/backend/storage/ipc/latch.c:1247
Backtrace stopped: Cannot access memory at address 0x7fff75155208
(gdb) cont
Continuing.
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) detach
The program is not being run.
(gdb) quit