Search Postgresql Archives

Diagnose memory leak in logical replication?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

Hope everyone is doing well.

While playing with wal2json and debezium we faced an issue where walsender processes eat more and more memory (several hundred megabytes per hour) until we restart them by stopping their consumption.

We first thought to something wrong in wal2json[1] but after more testing today we have the exact same situation with pgoutput plugin.

The environment is Debian buster with packages from apt.postgresql.org (currently using 12.2-2.pgdg100+1):

postgresql-12 - 12.2-2.pgdg100+1
postgresql-12-wal2json - 2.2-1.pgdg100+1
postgresql-client-12 - 12.2-2.pgdg100+1
postgresql-client-common - 213.pgdg100+1
postgresql-common - 213.pgdg100+1

The steps to create slot and consume wal2json are the following:

pg_recvlogical -d test_db -U test_u -p 5433 -W --slot test_slot --create-slot -P wal2json pg_recvlogical -d test_db -U test_u -p 5433 -W --slot test_slot --start -f /dev/null

The steps done to create slots and consume with pgoutput plugin are the following:
In PostgreSQL:
CREATE PUBLICATION dbz_minimal_publication FOR TABLE public.xxx;

And then from the shell:
pg_recvlogical -d test_db --slot pgoutput_minimal_test_slot --create-slot -P pgoutput pg_recvlogical -d test_db -U test_u -p 5433 -W --slot pgoutput_minimal_test_slot --start -h localhost -o proto_version=1 -o publication_names='dbz_minimal_publication' -f /dev/null


A few minutes after creating these slots (Thu Nov  5 12:55:45 2020)
Every 5.0s: ps eo user,pid,vsz,rss $(pgrep -f walsender)
USER       PID    VSZ   RSS
postgres 3080 11672352 916664 <- this is the walsender using wal2json plugin postgres 5740 11533412 760332 <- this is the walsender using pgoutput plugin
...
postgres 14333 13011864 2246328 <- those are other walsender using wal2json plugin
postgres 14383 13011864 2244788
postgres 14384 13011864 2246316

Same thing a few hours later: Thu Nov  5 14:01:55 2020
Every 5.0s: ps eo user,pid,vsz,rss $(pgrep -f walsender)
USER       PID    VSZ   RSS
postgres 3080 12153832 1400920 <- this is the walsender using wal2json plugin postgres 5740 12014716 1245524 <- this is the walsender using pgoutput plugin
...
postgres 14333 13493032 2728520
postgres 14383 13493032 2727012
postgres 14384 13493032 2728672

Per wal2json author suggestion and based on other posts on this list we attached gdb to one of the walsender process a few weeks ago and did the following:
p (long)MemoryContextStats((long)TopMemoryContext)

It seems we have a lot of ReorderBufferToastHash under the Logical decoding context: TopMemoryContext: 435000 total in 9 blocks; 29864 free (17 chunks); 405136 used Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1456 free (0 chunks); 6736 used wal2json output context: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used MessageContext: 8192 total in 1 blocks; 6896 free (1 chunks); 1296 used Replication command context: 82104 total in 5 blocks; 25848 free (12 chunks); 56256 used Logical decoding context: 344176 total in 5 blocks; 26672 free (12 chunks); 317504 used snapshot builder context: 524288 total in 7 blocks; 172928 free (24 chunks); 351360 used ReorderBuffer: 32768 total in 3 blocks; 28048 free (26 chunks); 4720 used ReorderBufferToastHash: 524288 total in 7 blocks; 28784 free (22 chunks); 495504 used ReorderBufferToastHash: 524288 total in 7 blocks; 28784 free (22 chunks); 495504 used ReorderBufferToastHash: 524288 total in 7 blocks; 28784 free (22 chunks); 495504 used ReorderBufferToastHash: 524288 total in 7 blocks; 28784 free (22 chunks); 495504 used
...
ReorderBufferByXid: 16384 total in 2 blocks; 3544 free (2 chunks); 12840 used Tuples: 2080374888 total in 248 blocks (1259603 chunks); 241400 free (265927 chunks); 2080133488 used
          TXN: 648 total in 0 blocks; 0 free (0 chunks); 648 used
Change: 79881960 total in 9751 blocks; 74080 free (926 chunks); 79807880 used Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used

The full output can be found on github[2]

Please note that all of this seems to be related to the DB activity, simply doing a test with for example a pgbench in loop does not allow to reproduce this "leak".

Would someone point me in the right direction to identify what could be the reason for this behaviour so that we can try to fix this?

Thanks,
Raph
[1] https://github.com/eulerto/wal2json/issues/180
[2] https://github.com/eulerto/wal2json/files/5292603/memory.txt







[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux