Hi all,
This morning I was checking postgres servers logs, looking for
errors (we've recently upgraded them and changed default config)
and long running queries when I found one of the servers had
really big logs since yesterday. It was giving the error of this
mail's subject: out of memory, failed on request of size XXX on
automatic vacuum of table YYY. A quick search revealed me some
postgresql-lists messages talking about work_mem and
shared_buffers configuration options, some kernel config options
too. Although all of them were messages from several years ago, I
decided to cut my shared_buffers configured value and restart
server: now it looks like error is gone. But I'd like to
understand what's beyond the logged error (it's really long and
refers to things about inner functionalities that I'm missing),
how to detect what config options are possibly conflicting and,
most important, I want to know if I've solved it right.
-- Logged error (one of them) ==> Original log is in spanish
and I've translated myself. I've replaced my database objects
names
TopMemoryContext: 77568 total in 5 blocks; 15612 free (5 chunks);
61956 used
TopTransactionContext: 8192 total in 1 blocks; 7604 free (40
chunks); 588 used
TOAST to main relid map: 24576 total in 2 blocks; 13608 free (3
chunks); 10968 used
AV worker: 8192 total in 1 blocks; 3228 free (6 chunks); 4964
used
Autovacuum Portal: 0 total in 0 blocks; 0 free (0 chunks);
0 used => Is this line the one that points to the
task that has run of of memory?
Vacuum: 8192 total in 1 blocks; 8132 free (0 chunks); 60
used
Operator class cache: 8192 total in 1 blocks; 4420 free (0
chunks); 3772 used
smgr relation table: 8192 total in 1 blocks; 316 free (0
chunks); 7876 used
TransactionAbortContext: 32768 total in 1 blocks; 32748 free (0
chunks); 20 used
Portal hash: 8192 total in 1 blocks; 3460 free (0 chunks); 4732
used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 8192 total in 1 blocks; 1884 free (0 chunks);
6308 used
CacheMemoryContext: 516096 total in 6 blocks; 234136 free (5
chunks); 281960 used
<index>: 1024 total in 1 blocks; 476 free (0 chunks);
548 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks);
308 used
pg_index_indrelid_index: 1024 total in 1 blocks; 676 free (0
chunks); 348 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 604
free (0 chunks); 420 used
pg_db_role_setting_databaseid_rol_index: 1024 total in 1
blocks; 644 free (0 chunks); 380 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 476 free
(0 chunks); 548 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks;
716 free (0 chunks); 308 used
pg_enum_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 644 free
(0 chunks); 380 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 716 free
(0 chunks); 308 used
pg_statistic_relid_att_inh_index: 1024 total in 1 blocks; 476
free (0 chunks); 548 used
pg_cast_source_target_index: 1024 total in 1 blocks; 644 free
(0 chunks); 380 used
pg_language_name_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_transform_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_collation_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 476 free (0
chunks); 548 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 676 free (0
chunks); 348 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 644
free (0 chunks); 380 used
pg_ts_config_map_index: 1024 total in 1 blocks; 476 free (0
chunks); 548 used
pg_opclass_oid_index: 1024 total in 1 blocks; 676 free (0
chunks); 348 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 716
free (0 chunks); 308 used
pg_event_trigger_evtname_index: 1024 total in 1 blocks; 716
free (0 chunks); 308 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_event_trigger_oid_index: 1024 total in 1 blocks; 716 free
(0 chunks); 308 used
pg_conversion_default_index: 1024 total in 1 blocks; 476 free
(0 chunks); 548 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 476
free (0 chunks); 548 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 644
free (0 chunks); 380 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 644 free (0
chunks); 380 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 476
free (0 chunks); 548 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 716 free
(0 chunks); 308 used
pg_type_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_constraint_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 644
free (0 chunks); 380 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 644 free
(0 chunks); 380 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 644 free
(0 chunks); 380 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_operator_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 716 free
(0 chunks); 308 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_amop_opr_fam_index: 1024 total in 1 blocks; 476 free (0
chunks); 548 used
pg_default_acl_role_nsp_obj_index: 1024 total in 1 blocks; 476
free (0 chunks); 548 used
pg_collation_name_enc_nsp_index: 1024 total in 1 blocks; 476
free (0 chunks); 548 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 644 free (0
chunks); 380 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 644 free (0
chunks); 380 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_class_oid_index: 1024 total in 1 blocks; 676 free (0
chunks); 348 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 476
free (0 chunks); 548 used
pg_transform_type_lang_index: 1024 total in 1 blocks; 644 free
(0 chunks); 380 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 604
free (0 chunks); 420 used
pg_proc_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_language_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_namespace_oid_index: 1024 total in 1 blocks; 676 free (0
chunks); 348 used
pg_amproc_fam_proc_index: 1024 total in 1 blocks; 436 free (0
chunks); 588 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 716 free
(0 chunks); 308 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 644
free (0 chunks); 380 used
pg_conversion_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 644
free (0 chunks); 380 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 644 free
(0 chunks); 380 used
pg_authid_oid_index: 1024 total in 1 blocks; 676 free (0
chunks); 348 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 644
free (0 chunks); 380 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
pg_shseclabel_object_index: 1024 total in 1 blocks; 476 free
(0 chunks); 548 used
pg_replication_origin_roname_index: 1024 total in 1 blocks;
716 free (0 chunks); 308 used
pg_database_datname_index: 1024 total in 1 blocks; 676 free (0
chunks); 348 used
pg_replication_origin_roiident_index: 1024 total in 1 blocks;
716 free (0 chunks); 308 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 644
free (0 chunks); 380 used
pg_database_oid_index: 1024 total in 1 blocks; 676 free (0
chunks); 348 used
pg_authid_rolname_index: 1024 total in 1 blocks; 716 free (0
chunks); 308 used
WAL record construction: 49520 total in 2 blocks; 6876 free (0
chunks); 42644 used
PrivateRefCount: 8192 total in 1 blocks; 5516 free (0 chunks);
2676 used
MdSmgr: 8192 total in 1 blocks; 8004 free (0 chunks); 188 used
LOCALLOCK hash: 8192 total in 1 blocks; 2428 free (0 chunks);
5764 used
Timezones: 104028 total in 2 blocks; 5516 free (0 chunks); 98512
used
Postmaster: 8192 total in 1 blocks; 7796 free (8 chunks); 396
used
ident parser context: 0 total in 0 blocks; 0 free (0 chunks);
0 used
hba parser context: 7168 total in 3 blocks; 3148 free (1
chunks); 4020 used
ErrorContext: 8192 total in 1 blocks; 8172 free (4 chunks); 20
used
Grand total: 1009356 bytes in 130 blocks; 436888 free (72 chunks);
572468 used
[2019-11-17 09:27:37.425 CET] – 11899 ERROR: out of memory
[2019-11-17 09:27:37.425 CET] – 11899 DETAIL: Failed on request
of size 700378218.
[2019-11-17 09:27:37.425 CET] – 11899 CONTEXT: automatic vacuum
on table «my_db.public.my_table»
-- System specs:
Centos 6.5 i686.
12 GB RAM.
[root@myserver pg_log]# lscpu
Architecture: i686
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: GenuineIntel
[root@myserver pg_log]# psql postgres -c "select version();"
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.15 on i686-pc-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-23), 32-bit
(1 row)
-- Postgres configured values
[root@myserver data]# cat postgresql.conf | grep CHANGED
listen_addresses = '*' # what IP address(es) to listen on;
- CHANGED
port = 5432 # (change requires restart) - CHANGED
shared_buffers = 1024MB # min 128kB - Default
128MB - CHANGED ==> This value was 2048 when I was getting out
of memory error.
work_mem = 64MB # min 64kB - Default 4MB - CHANGED
maintenance_work_mem = 704MB # min 1MB - Default 64MB -
CHANGED
effective_io_concurrency = 2 # 1-1000; 0 disables
prefetching - Df 1 - CHANGED
wal_buffers = 16MB # min 32kB, -1 sets based on
shared_buffers - Df -1 - CHANGED
max_wal_size = 1GB # Default 1GB - CHANGED
min_wal_size = 512MB # Default 80MB - CHANGED
checkpoint_completion_target = 0.8 # checkpoint target
duration, 0.0 - 1.0 - Df 0.5 - CHANGED
effective_cache_size = 8448MB #Default 4GB - CHANGED
log_min_duration_statement = 30000 # -1 is disabled, 0 logs all
statements - Df -1 - CHANGED
log_line_prefix = '[%m] – %p %q- %u@%h:%d – %a ' # special
values: - CHANGED
I've now realized that work_mem is only 64MB, isn't it a bit low?
Please note that server has a 32bit OS installed (don't know
why). I'm aware of some limitations with memory configuration on
32bit systems [1]
Summary: how to interprete the log message, and is the error
controlled by the change made to shared_buffers value (from 2GB to
1GB).
Kind regards,
Ekaterina
[1] this post written by Robert Haas:
http://rhaas.blogspot.com/2011/05/sharedbuffers-on-32-bit-systems.html