On Fri, Jan 10, 2014 at 11:09 AM, Steve Atkins <steve@xxxxxxxxxxx> wrote:
What are the exact error messages you’re getting, and where are you seeing them?
On Jan 10, 2014, at 8:35 AM, Preston Hagar <prestonh@xxxxxxxxx> wrote:
> tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors despite the server now having 32 GB instead of 4 GB of RAM and the workload and number of clients remaining the same.
>
>
> Details:
>
> We have been using Postgresql for some time internally with much success. Recently, we completed a migration off of an older server running 8.3 to a new server running 9.3. The older server had 4GB of RAM, the new server has 32 GB.
>
> For some reason, since migrating we are getting lots of "out of memory" and "cannot allocate memory" errors on the new server when the server gets under a decent load. We have upped shmmax to 17179869184 and shmall to 4194304.
Thanks for the reply. I'm seeing them in the main postgresql log (/var/log/postgresql/postgresql-9.3-main.log)
Here are a couple of examples from the incident we had this morning:
2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for connection: Cannot allocate memory
2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 160408 total in 19 blocks; 12984 free (41 chunks); 147424 used
TopTransactionContext: 8192 total in 1 blocks; 7392 free (0 chunks); 800 used
Btree proof lookup cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
MessageContext: 524288 total in 7 blocks; 225240 free (3 chunks); 299048 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 57344 total in 3 blocks; 34320 free (10 chunks); 23024 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 768 free (0 chunks); 256 used
ExecutorState: 329080 total in 8 blocks; 105944 free (4 chunks); 223136 used
TupleSort: 32816 total in 2 blocks; 176 free (2 chunks); 32640 used
printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8128 free (3 chunks); 64 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 7896 free (0 chunks); 296 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 11792 free (3 chunks); 12784 used
CacheMemoryContext: 9815680 total in 25 blocks; 8143416 free (1 chunks); 1672264 used
pg_toast_3598032_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
mls_office_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
staff_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
staff_desk_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
person_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
person_person_workphone_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
person_person_mobilephone_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
person_person_lastname_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
person_person_homephone_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
person_person_homeofficephone_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
person_person_firstname_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
agent_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
agent_membertype_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
agent_floydtraining_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
agent_agent_sync_realtorid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
agent_agent_sync_oagentid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
agent_agent_personid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
agent_agent_license_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_toast_2619_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
mls_agent_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_uidprp_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_streetnum_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_streetname: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_sqfttotal_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_ownername_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_officelist_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_modified: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_mlsnumber_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_liststatus_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_listprice_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_city_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_beds_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_res_bathsfull_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_subdivision: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_zipcode: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_uidprp: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_priority: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_officelist: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_modified: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_mlsnum: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_longitude_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_liststatus: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_listprice: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_latitude_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_ishud_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_city_upcase_idx: 3072 total in 2 blocks; 1848 free (0 chunks); 1224 used
mls_prp_city: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_agentlist2: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_prp_agentlist: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_media_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_media_tableuid: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_media_mediasource_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
mls_media_custompic_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_sync_ilresid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_streetnumber_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_streetname_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_mlsnumber_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_longitude_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_listingid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_latitude_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_brokerageid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listing_agentid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
listing_listdate_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
code_primary: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
class_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_namespace_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_database_datname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_authid_rolname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
MdSmgr: 8192 total in 1 blocks; 5440 free (0 chunks); 2752 used
ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
hba parser context: 15360 total in 4 blocks; 6544 free (2 chunks); 8816 used
LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production 10.1.1.6(36680)ERROR: out of memory
2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production 10.1.1.6(36680)DETAIL: Failed on request of size 500.
After some of the connections cleared up, (we load a lot of data around that time), things got "back to normal". Our Nagios monitoring showed a 30 minute window or so when no new connections could be made and the log is filled with more or less the same lines.
>That’s probably not related to the problems you’re seeing - I have apps that hold a connection to the database open for years. As long as it doesn’t keep a transaction open for a long time, you’re fine.
> We had originally copied our shared_buffers, work_mem, wal_buffers and other similar settings from our old config, but after getting the memory errors have tweaked them to the following:
>
> shared_buffers = 7680MB
> temp_buffers = 12MB
> max_prepared_transactions = 0
> work_mem = 80MB
> maintenance_work_mem = 1GB
> wal_buffers = 8MB
> max_connections = 350
>
> The current settings seem to have helped, but we are still occasionally getting the errors.
>
> The weird thing is that our old server had 1/8th the RAM, was set to max_connections = 600 and had the same clients connecting in the same way to the same databases and we never saw any errors like this in the several years we have been using it.
>
> One issue I could see is that one of our main applications that connects to the database, opens a connection on startup, holds it open the entire time it is running, and doesn't close it until the app is closed. In daily usage, for much of our staff it is opened first thing in the morning and left open all day (meaning the connection is held open for 8+ hours). This was never an issue with 8.3, but I know it isn't a "best practice" in general.
Good to know, thanks.
>Using pgbouncer is probably a good idea - to reduce the number of concurrent connections, rather than the length of connections, though.
> We are working to update our application to be able to use pgbouncer with transaction connections to try to alleviate the long held connections, but it will take some time.
I'm testing a updated version of our application against pgbouncer now so hopefully that will help some.
>Cheers,
> In the meantime, is there some other major difference or setting in 9.3 that we should look out for that could be causing this? Like I said, the same database with the same load and number of clients has been running on a 8.3 install for years (pretty much since 2008 when 8.3 was released) with lesser hardware with no issues.
>
> Let me know if any other information would help out or if anyone has suggestions of things to check.
Steve
Thanks for your help.
Preston