On 30.03.2021 20:46, Tom Lane wrote:
Stephan Knauss <pgsql@xxxxxxxxxxxxxxxxxx> writes:
The wiki suggested to dump MemoryContext states for more details, but
something strange happens when attaching gdb. It seems that the process
is immediately killed and I can no longer dump such details.
(I think the -v option is the one that matters on Linux, not -d
as you might guess). The idea here is that the backends would
get an actual ENOMEM failure from malloc() before reaching the
point where the kernel's OOM-kill behavior takes over. Given
that, they'd dump memory maps to stderr of their own accord,
and you could maybe get some insight as to what's leaking.
This'd also reduce the severity of the problem when it does
happen.
Hello Tom, the output below looks similar to the OOM output you
expected. Can you give a hint how to interpret the results?
I had a backend which had a larger amount of memory allocated already.
So I gave "gcore -a" a try.
In contrast to the advertised behavior, the process did not continue to
run but I got a core file at least. Probably related to gcore just
calling gdb attach which somehow triggers a SIGKILL of all backends.
With 4.2GB in size it hopefully has most of the relevant memory
structures are there. Without a running process I still can not call
MemoryContextStats(), but I found a macro which claims to decode the
memory structure post mortem:
https://www.cybertec-postgresql.com/en/checking-per-memory-context-memory-consumption/
This gave me the following memory structure:
How should it be interpreted? It looks like the size is bytes as it
calculates with pointers. But the numbers look a bit small, given that I
had a backend with roughly 6GB RSS memory.
I thought it might print overall size and then indent and print the
memory of children, but the numbers do indicate this is not the case,
having a higher level smaller size than children:
CachedPlanSource: 67840
unnamed prepared statement: 261920
So how to read it and any indication why I have a constantly increasing
memory footprint? Is there any indication where multiple gigabytes are
allocated?
root@0ec98d20bda2:/# gdb /usr/lib/postgresql/13/bin/postgres core.154218
<gdb-context
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/13/bin/postgres...Reading
symbols from
/usr/lib/debug/.build-id/31/ae2853776500091d313e76cf679017e697884b.debug...done.
done.
warning: core file may not match specified executable file.
[New LWP 154218]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: osm gis 172.20.0.3(51894) idle'.
#0 0x00007fc01cfa07b7 in epoll_wait (epfd=4, events=0x55f403584080,
maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 ../sysdeps/unix/sysv/linux/epoll_wait.c:
No such file or directory.
(gdb) >>>> > > >>>(gdb) (gdb) >>>> > > >>>>> > > >>(gdb) (gdb)
TopMemoryContext: 109528
dynahash: 7968
HandleParallelMessages: 7968
dynahash: 7968
dynahash: 7968
dynahash: 7968
dynahash: 24392
dynahash: 24352
RowDescriptionContext: 24352
MessageContext: 7968
dynahash: 7968
dynahash: 32544
TransactionAbortContext: 32544
dynahash: 7968
TopPortalContext: 7968
dynahash: 16160
CacheMemoryContext: 1302944
CachedPlan: 138016
CachedPlanSource: 67840
unnamed prepared statement: 261920
index info: 1824
index info: 1824
index info: 3872
index info: 1824
index info: 1824
index info: 3872
index info: 3872
index info: 3872
index info: 1824
index info: 3872
relation rules: 32544
index info: 1824
index info: 1824
index info: 1824
index info: 3872
relation rules: 24352
index info: 3872
index info: 3872
index info: 1824
index info: 3872
index info: 3872
index info: 3872
index info: 1824
index info: 3872
index info: 1824
index info: 3872
relation rules: 32544
index info: 1824
index info: 2848
index info: 1824
index info: 3872
index info: 3872
index info: 3872
index info: 3872
index info: 3872
index info: 3872
index info: 3872
index info: 1824
index info: 3872
index info: 1824
index info: 1824
relation rules: 32544
index info: 1824
index info: 2848
index info: 1824
index info: 800
index info: 1824
index info: 800
index info: 800
index info: 2848
index info: 1824
index info: 800
index info: 800
index info: 800
index info: 2848
index info: 1824
index info: 1824
--Type <RET> for more, q to quit, c to continue without paging-- index
info: 2848
index info: 1824
index info: 1824
index info: 800
index info: 1824
index info: 800
index info: 800
index info: 800
index info: 2848
index info: 2848
index info: 1824
index info: 1824
index info: 800
index info: 800
index info: 2848
index info: 800
index info: 1824
index info: 1824
index info: 800
index info: 1824
index info: 1824
index info: 1824
index info: 800
index info: 1824
index info: 1824
index info: 1824
index info: 800
index info: 2848
index info: 2848
index info: 2848
index info: 800
index info: 800
index info: 1824
index info: 1824
index info: 1824
index info: 800
index info: 1824
index info: 1824
index info: 2848
index info: 1824
index info: 1824
index info: 1824
index info: 1824
index info: 800
index info: 1824
index info: 2848
index info: 800
index info: 1824
index info: 800
index info: 1824
index info: 1824
index info: 800
index info: 1824
index info: 1824
index info: 1824
index info: 800
index info: 1824
index info: 2848
index info: 1824
index info: 1824
index info: 1824
index info: 1824
index info: 1824
index info: 1824
index info: 1824
WAL record construction: 49544
dynahash: 7968
MdSmgr: 7968
dynahash: 16160
dynahash: 103896
ErrorContext: 7968
(gdb) quit
root@0ec98d20bda2:/# cat gdb-context
define sum_context_blocks
set $context = $arg0
set $block = ((AllocSet) $context)->blocks
set $size = 0
while ($block)
set $size = $size + (((AllocBlock) $block)->endptr - ((char *) $block))
set $block = ((AllocBlock) $block)->next
end
printf "%s: %d\n",((MemoryContext)$context)->name, $size
end
define walk_contexts
set $parent_$arg0 = ($arg1)
set $indent_$arg0 = ($arg0)
set $i_$arg0 = $indent_$arg0
while ($i_$arg0)
printf " "
set $i_$arg0 = $i_$arg0 - 1
end
sum_context_blocks $parent_$arg0
set $child_$arg0 = ((MemoryContext) $parent_$arg0)->firstchild
set $indent_$arg0 = $indent_$arg0 + 1
while ($child_$arg0)
walk_contexts $indent_$arg0 $child_$arg0
set $child_$arg0 = ((MemoryContext) $child_$arg0)->nextchild
end
end
walk_contexts 0 TopMemoryContext