Hi All,
I need some assistance with a particular out of memory issue I am
currently experiencing, your thoughts would be greatly appreciated.
Configuration:
[1] 3 x ESX VM's
[a] 8 vCPU's each
[b] 16GB memory each
[2] CentOS 6.5 64-bit on each
[a] Kernel Rev: 2.6.32-431.17.1.el6.x86_64
[3] Postgresql from official repository
[a] Version 9.3.4
[4] Configured as a master-slave pacemaker/cman/pgsql cluster
[a] Pacemaker version: 1.1.10-14
[b] CMAN version: 3.0.12.1-59
[c] pgsql RA version: taken from clusterlabs git repo 3
months ago (cant find version in ra file)
I did not tune any OS IPC parameters as I believe Postgresql v9.3 doesnt
use those anymore (Please correct me if I am wrong).
I have the following OS settings in place to try get optimal use of
memory and smooth out fsync operations (comments may not be 100%
accurate :) ):
# Shrink FS cache before paging to swap
vm.swappiness = 0
# Dont hand out more memory than neccesary
vm.overcommit_memory = 2
# Smooth out FS Sync
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
I have the following memory related settings for Postgresql:
work_mem = 1MB
maintenance_work_mem = 128MB
effective_cache_size = 6GB
max_connections = 700
shared_buffers = 4GB
temp_buffers = 8MB
wal_buffers = 16MB
max_stack_depth = 2MB
Currently there are roughly 300 client connections active when this
error occurs.
What appears to have happened here is that there is an autovacuum
process that attempts to kick off and fails with an out of memory error,
then shortly after that, the cluster resource agent attempts a
connection to template1 to try and see if the database is up, this
connection then fails with an out of memory error as well, at which
point the cluster fails over the database to another node.
Looking at the system memory usage, there is roughly 4GB - 5GB free
physical memory, swap (21GB) is not in use at all when this error
occurs, page cache is roughly 3GB in size when this occurs.
I have attached the two memory dump logs where the first error is
related to autovacuum and the second is the cluster ra connection
attempt which fails too. I do not know how to read that memory
information to come up with any ideas to correct this issue.
The OS default for stack depth is 10MB, shall I attempt to increase the
max_stack_depth to 10MB too?
The system does not appear to be running out of memory, so I'm wondering
if I have some issue with limits or some memory related settings.
Any thoughts, tips, suggestions would be greatly appreciated.
If you need any additional info from me please dont hesitate to ask.
Thanks
Bruce
TopMemoryContext: 171136 total in 13 blocks; 4128 free (5 chunks); 167008 used
Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used
TopTransactionContext: 57344 total in 3 blocks; 21280 free (12 chunks); 36064 used
Analyze: 3377584 total in 10 blocks; 2384 free (28 chunks); 3375200 used
TOAST to main relid map: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
AV worker: 8192 total in 1 blocks; 3048 free (6 chunks); 5144 used
Autovacuum Portal: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
Vacuum: 8192 total in 1 blocks; 8080 free (0 chunks); 112 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
CacheMemoryContext: 827528 total in 21 blocks; 30168 free (1 chunks); 797360 used
sipoutboundproxy_idx: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
sipipaddr_idx: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
siphost_idx: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
accountcode_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
sippeers_pkey: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
sippeers_name_key: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_constraint_conrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_proc_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_namespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_database_datname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_authid_rolname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
MdSmgr: 8192 total in 1 blocks; 8032 free (0 chunks); 160 used
ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
hba parser context: 31744 total in 5 blocks; 3440 free (0 chunks); 28304 used
LOCALLOCK hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
Postmaster: 24576 total in 2 blocks; 24192 free (307 chunks); 384 used
ErrorContext: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used
2014-06-16 11:22:04 IST [30081]: [1-1] db=,user= ERROR: out of memory
2014-06-16 11:22:04 IST [30081]: [2-1] db=,user= DETAIL: Failed on request of size 410.
2014-06-16 11:22:04 IST [30081]: [3-1] db=,user= CONTEXT: automatic analyze of table "blueface-service.public.sipaccounts"
TopMemoryContext: 171184 total in 13 blocks; 7984 free (6 chunks); 163200 used
smgr relation table: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
TopTransactionContext: 8192 total in 1 blocks; 6432 free (5 chunks); 1760 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 8192 total in 1 blocks; 640 free (0 chunks); 7552 used
CacheMemoryContext: 555696 total in 19 blocks; 272 free (3 chunks); 555424 used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_type_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_operator_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_class_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_proc_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_namespace_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_database_datname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_database_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_authid_rolname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
MdSmgr: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
hba parser context: 31744 total in 5 blocks; 3440 free (0 chunks); 28304 used
LOCALLOCK hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
Postmaster: 24576 total in 2 blocks; 23664 free (307 chunks); 912 used
ErrorContext: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used
2014-06-16 11:23:12 IST [1682]: [3-1] db=template1,user=postgres FATAL: out of memory
2014-06-16 11:23:12 IST [1682]: [4-1] db=template1,user=postgres DETAIL: Failed on request of size 304.