Hello,
po 11. 7. 2022 v 14:47 odesílatel Pavel Stehule <pavel.stehule@xxxxxxxxx> napsal:
HiIt's looks like memory leak
As we are collecting some data for Postgres processes, I've found processes using Rss Anonymous memory allocations:
Some examples:
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31522 RssAnon: 742600 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31522 RssAnon: 742600 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31522 RssAnon: 742852 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31522 RssAnon: 743112 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31522 RssAnon: 743112 kB
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31647 RssAnon: 684632 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31647 RssAnon: 684632 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31647 RssAnon: 684676 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31647 RssAnon: 684700 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31647 RssAnon: 684824 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31522 RssAnon: 742600 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31522 RssAnon: 742852 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31522 RssAnon: 743112 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31522 RssAnon: 743112 kB
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31647 RssAnon: 684632 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31647 RssAnon: 684632 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31647 RssAnon: 684676 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31647 RssAnon: 684700 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31647 RssAnon: 684824 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
other processes uses less memory:
2022-07-08T14:52:08+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB
2022-07-08T14:53:02+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB
2022-07-08T14:53:54+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 4217 RssAnon: 2532 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 4217 RssAnon: 2532 kB
2022-07-08T14:53:02+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB
2022-07-08T14:53:54+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 4217 RssAnon: 2532 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 4217 RssAnon: 2532 kB
And Shared memory:
2022-07-08T14:59:12+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:20+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:28+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:37+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:45+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:53+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:59:20+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:28+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:37+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:45+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:53+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
while other processes uses failry less of shared mem:
2022-07-08T14:55:25+0200 Name: postmaster Pid: 8521 RssShmem: 1988 kB
2022-07-08T14:55:33+0200 Name: postmaster Pid: 8521 RssShmem: 1988 kB
2022-07-08T14:55:40+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:55:49+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:55:57+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:56:06+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:56:15+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:55:33+0200 Name: postmaster Pid: 8521 RssShmem: 1988 kB
2022-07-08T14:55:40+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:55:49+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:55:57+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:56:06+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:56:15+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
Wich processes are better candidates for investigation, the ones using more Anonymous memory or the ones using larger shared memory?
I can do the (tedious) work to get a list of statements that a given PID executed (at least the statements that appear in the minute snapshots).
ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR: out of memory
2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL: Failed on request of size 152094068 in memory context "TopTransactionContext".
2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT: automatic vacuum of table "prematch.replication.tab_queue_tmp"
2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG: could not fork worker process: Cannot allocate memory
2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG: could not fork worker process: Cannot allocate memory
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG: background worker "parallel worker" (PID 4303) was terminated by signal 6: AbortedBut what you see is probably just side effect. Do you use some extension in C++? This is C++ message. This process kills Postgres.
These extensions are installed (and no change from 14.3):
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+------------------------------------------------------------------------
amcheck | 1.3 | public | functions for verifying relation integrity
dblink | 1.2 | public | connect to other PostgreSQL databases from within a database
file_fdw | 1.0 | public | foreign-data wrapper for flat file access
hstore | 1.8 | public | data type for storing sets of (key, value) pairs
hypopg | 1.3.1 | public | Hypothetical indexes for PostgreSQL
pageinspect | 1.9 | public | inspect the contents of database pages at a low level
pg_buffercache | 1.3 | public | examine the shared buffer cache
pg_stat_kcache | 2.2.0 | public | Kernel statistics gathering
pg_stat_statements | 1.9 | public | track planning and execution statistics of all SQL statements executed
pgcrypto | 1.3 | public | cryptographic functions
pgstattuple | 1.5 | public | show tuple-level statistics
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
plpgsql_check | 2.1 | public | extended check for plpgsql functions
postgres_fdw | 1.1 | public | foreign-data wrapper for remote PostgreSQL servers
tablefunc | 1.0 | public | functions that manipulate whole tables, including crosstab
(15 rows)
Name | Version | Schema | Description
--------------------+---------+------------+------------------------------------------------------------------------
amcheck | 1.3 | public | functions for verifying relation integrity
dblink | 1.2 | public | connect to other PostgreSQL databases from within a database
file_fdw | 1.0 | public | foreign-data wrapper for flat file access
hstore | 1.8 | public | data type for storing sets of (key, value) pairs
hypopg | 1.3.1 | public | Hypothetical indexes for PostgreSQL
pageinspect | 1.9 | public | inspect the contents of database pages at a low level
pg_buffercache | 1.3 | public | examine the shared buffer cache
pg_stat_kcache | 2.2.0 | public | Kernel statistics gathering
pg_stat_statements | 1.9 | public | track planning and execution statistics of all SQL statements executed
pgcrypto | 1.3 | public | cryptographic functions
pgstattuple | 1.5 | public | show tuple-level statistics
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
plpgsql_check | 2.1 | public | extended check for plpgsql functions
postgres_fdw | 1.1 | public | foreign-data wrapper for remote PostgreSQL servers
tablefunc | 1.0 | public | functions that manipulate whole tables, including crosstab
(15 rows)
It can be related to some operation over prematch.replication.tab_queue_tmp because in both cases an atovacuum was started.
On 14.3 (and also older major versions), there were no such issues. This table is heavily updated, so the autovacuum is frequent here.
132 hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks); 21136 used
133 ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR: out of memory
1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL: Failed on request of size 152094068 in memory context "TopTransactionContext".
1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT: automatic vacuum of table "prematch.replication.tab_queue_tmp"I am not sure, but it looks like corrupted table prematch.replication.tab_queue_tmp
$ /usr/pgsql-14/bin/pg_amcheck -v -d prematch -t replication.tab_queue_tmp --heapallindexed
pg_amcheck: including database "prematch"
pg_amcheck: in database "prematch": using amcheck version "1.3" in schema "public"
pg_amcheck: checking heap table "prematch.replication.tab_queue_tmp"
pg_amcheck: checking btree index "prematch.replication.tab_queue_tmp_pkey"
$ echo $?
0
pg_amcheck: including database "prematch"
pg_amcheck: in database "prematch": using amcheck version "1.3" in schema "public"
pg_amcheck: checking heap table "prematch.replication.tab_queue_tmp"
pg_amcheck: checking btree index "prematch.replication.tab_queue_tmp_pkey"
$ echo $?
0
Something else I can check?
Thanks Ales
And a few days later again:2022-07-08 14:54:22 CEST [4791]: [81186-1] user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' terminate called after throwing an instance of 'std::bad_alloc' terminate called after throwing an instance of 'std::bad_alloc' what(): what(): std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1] user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251) was terminated by signal 6: Aborted2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG: terminating any other active server processesThanks Ales Zeleny