For the past few days, we've been seeing unexpected high CPU spikes in our system. We observed the following:
- every single CPU spike was preceded by low 'free' memory even though 'cached' is quite high
- as soon as we shut down any of our applications which is occupying some DB connections (e.g., pgpool), the 'free' memory usage goes up and CPU load immediately drops (please see below)
- we saw instances when the ‘free’ memory did reach low values but CPU remained OK
I understand how running out of memory could cause various issues with the DB, but in this case, we had plenty of memory in the ‘cached’ portion. Why would CPU load go up when there's still plenty of room in the 'cached' memory?
Here's the session:
04:58:37 up 31 days, 23:41, 0 users, load average: 2.37, 1.91, 1.68
total used free shared buffers cached
Mem: 31720 31188 532 0 90 22852
(…)
05:00:37 up 31 days, 23:43, 1 user, load average: 5.51, 2.66, 1.95
total used free shared buffers cached
Mem: 31720 31452 268 0 77 22267
(…)
05:00:58 up 31 days, 23:44, 1 user, load average: 21.44, 6.52, 3.24
total used free shared buffers cached
Mem: 31720 31482 237 0 77 21704
(…)
05:01:18 up 31 days, 23:44, 1 user, load average: 42.98, 12.36, 5.22
total used free shared buffers cached
Mem: 31720 31477 243 0 77 21061
(…)
05:01:38 up 31 days, 23:44, 1 user, load average: 63.38, 18.99, 7.56
total used free shared buffers cached
Mem: 31720 31454 266 0 77 20410
(…)
05:03:20 up 31 days, 23:46, 1 user, load average: 110.10, 47.85, 19.07
total used free shared buffers cached
Mem: 31720 31326 394 0 76 19290
At this point, pgpool and apache were shut down:
05:03:40 up 31 days, 23:46, 1 user, load average: 113.51, 52.66, 21.26
total used free shared buffers cached
Mem: 31720 29835 1885 0 76 19291
(…)
05:04:00 up 31 days, 23:47, 1 user, load average: 82.49, 49.53, 20.90
total used free shared buffers cached
Mem: 31720 26082 5638 0 76 19300
(…)
05:04:20 up 31 days, 23:47, 1 user, load average: 60.37, 46.62, 20.56
total used free shared buffers cached
Mem: 31720 24701 7019 0 76 19311
(…)
05:04:40 up 31 days, 23:47, 1 user, load average: 43.63, 43.70, 20.15
total used free shared buffers cached
Mem: 31720 24797 6923 0 76 19320
(…)
05:05:00 up 31 days, 23:48, 1 user, load average: 31.70, 40.96, 19.75
total used free shared buffers cached
Mem: 31720 24947 6773 0 76 19326
(…)
05:05:20 up 31 days, 23:48, 1 user, load average: 23.12, 38.41, 19.36
total used free shared buffers cached
Mem: 31720 25036 6684 0 76 19334
(…)
05:05:40 up 31 days, 23:48, 1 user, load average: 17.12, 36.05, 18.99
total used free shared buffers cached
Mem: 31720 25197 6523 0 76 19340
(…)
05:06:00 up 31 days, 23:49, 1 user, load average: 12.84, 33.84, 18.63
total used free shared buffers cached
Mem: 31720 25316 6404 0 76 19367
(…)
05:06:20 up 31 days, 23:49, 1 user, load average: 9.85, 31.80, 18.28
total used free shared buffers cached
Mem: 31720 24728 6992 0 76 18839
(…)
05:06:40 up 31 days, 23:49, 1 user, load average: 7.61, 29.86, 17.93
total used free shared buffers cached
Mem: 31720 24835 6885 0 76 18847
(…)
05:07:00 up 31 days, 23:50, 1 user, load average: 5.74, 27.99, 17.57
total used free shared buffers cached
Mem: 31720 24971 6749 0 76 18852
(…)
05:07:20 up 31 days, 23:50, 1 user, load average: 4.48, 26.26, 17.22
total used free shared buffers cached
Mem: 31720 25133 6587 0 76 18861
(…)
05:07:40 up 31 days, 23:50, 2 users, load average: 3.83, 24.70, 16.90
total used free shared buffers cached
Mem: 31720 25351 6369 0 76 18872
(…)
05:08:00 up 31 days, 23:51, 2 users, load average: 3.10, 23.18, 16.56
total used free shared buffers cached
Mem: 31720 25334 6385 0 76 18879
(…)
05:08:20 up 31 days, 23:51, 2 users, load average: 2.52, 21.75, 16.23
total used free shared buffers cached
Mem: 31720 25362 6358 0 77 18884
Here are the pertinent machine and OS and Postgres details:
PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
Linux ps2db 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
postgres=# SELECT name, current_setting(name), source
postgres-# FROM pg_settings
postgres-# WHERE source NOT IN ('default', 'override');
name | current_setting | source
------------------------------+-------------------------------+----------------------
application_name | psql | client
archive_command | /bin/true | configuration file
archive_mode | on | configuration file
autovacuum_analyze_threshold | 50 | configuration file
autovacuum_freeze_max_age | 800000000 | configuration file
autovacuum_naptime | 5min | configuration file
autovacuum_vacuum_threshold | 50 | configuration file
bytea_output | escape | configuration file
checkpoint_completion_target | 0.7 | configuration file
checkpoint_segments | 128 | configuration file
checkpoint_timeout | 15min | configuration file
checkpoint_warning | 30s | configuration file
client_encoding | UTF8 | client
constraint_exclusion | partition | configuration file
cpu_index_tuple_cost | 0.005 | configuration file
cpu_operator_cost | 0.0025 | configuration file
cpu_tuple_cost | 0.01 | configuration file
custom_variable_classes | pg_stat_statements | configuration file
DateStyle | ISO, MDY | configuration file
default_statistics_target | 100 | configuration file
default_text_search_config | pg_catalog.english | configuration file
effective_cache_size | 16GB | configuration file
effective_io_concurrency | 1 | configuration file
enable_material | off | configuration file
escape_string_warning | on | configuration file
hot_standby | on | configuration file
lc_messages | C | configuration file
lc_monetary | en_US.UTF-8 | configuration file
lc_numeric | en_US.UTF-8 | configuration file
lc_time | en_US.UTF-8 | configuration file
listen_addresses | * | configuration file
log_autovacuum_min_duration | 0 | configuration file
log_checkpoints | on | configuration file
log_connections | on | configuration file
log_destination | csvlog | configuration file
log_directory | pg_log | configuration file
log_disconnections | on | configuration file
log_filename | postgresql.log.ps2db.%H | configuration file
log_line_prefix | %t [%d] [%u] [%p]: [%l-1] %h | configuration file
log_lock_waits | on | configuration file
log_min_duration_statement | 0 | configuration file
log_rotation_age | 1h | configuration file
log_temp_files | 0 | configuration file
log_timezone | Canada/Pacific | environment variable
log_truncate_on_rotation | on | configuration file
logging_collector | on | configuration file
maintenance_work_mem | 1GB | configuration file
max_connections | 500 | configuration file
max_locks_per_transaction | 512 | configuration file
max_stack_depth | 2MB | environment variable
max_standby_streaming_delay | 90min | configuration file
max_wal_senders | 6 | configuration file
pg_stat_statements.max | 10000 | configuration file
pg_stat_statements.track | all | configuration file
port | 5432 | configuration file
random_page_cost | 4 | configuration file
shared_buffers | 6GB | configuration file
shared_preload_libraries | pg_stat_statements | configuration file
standard_conforming_strings | off | configuration file
stats_temp_directory | /ram_postgres_stats | configuration file
temp_buffers | 16MB | configuration file
TimeZone | Canada/Pacific | environment variable
wal_keep_segments | 64 | configuration file
wal_level | hot_standby | configuration file
work_mem | 8MB | configuration file
(65 rows)
RAM: 32 gigs
CPU: 24 cores; Intel(R) Xeon(R) CPU X7460 @ 2.66GHz
RAID 10
- every single CPU spike was preceded by low 'free' memory even though 'cached' is quite high
- as soon as we shut down any of our applications which is occupying some DB connections (e.g., pgpool), the 'free' memory usage goes up and CPU load immediately drops (please see below)
- we saw instances when the ‘free’ memory did reach low values but CPU remained OK
I understand how running out of memory could cause various issues with the DB, but in this case, we had plenty of memory in the ‘cached’ portion. Why would CPU load go up when there's still plenty of room in the 'cached' memory?
Here's the session:
04:58:37 up 31 days, 23:41, 0 users, load average: 2.37, 1.91, 1.68
total used free shared buffers cached
Mem: 31720 31188 532 0 90 22852
(…)
05:00:37 up 31 days, 23:43, 1 user, load average: 5.51, 2.66, 1.95
total used free shared buffers cached
Mem: 31720 31452 268 0 77 22267
(…)
05:00:58 up 31 days, 23:44, 1 user, load average: 21.44, 6.52, 3.24
total used free shared buffers cached
Mem: 31720 31482 237 0 77 21704
(…)
05:01:18 up 31 days, 23:44, 1 user, load average: 42.98, 12.36, 5.22
total used free shared buffers cached
Mem: 31720 31477 243 0 77 21061
(…)
05:01:38 up 31 days, 23:44, 1 user, load average: 63.38, 18.99, 7.56
total used free shared buffers cached
Mem: 31720 31454 266 0 77 20410
(…)
05:03:20 up 31 days, 23:46, 1 user, load average: 110.10, 47.85, 19.07
total used free shared buffers cached
Mem: 31720 31326 394 0 76 19290
At this point, pgpool and apache were shut down:
05:03:40 up 31 days, 23:46, 1 user, load average: 113.51, 52.66, 21.26
total used free shared buffers cached
Mem: 31720 29835 1885 0 76 19291
(…)
05:04:00 up 31 days, 23:47, 1 user, load average: 82.49, 49.53, 20.90
total used free shared buffers cached
Mem: 31720 26082 5638 0 76 19300
(…)
05:04:20 up 31 days, 23:47, 1 user, load average: 60.37, 46.62, 20.56
total used free shared buffers cached
Mem: 31720 24701 7019 0 76 19311
(…)
05:04:40 up 31 days, 23:47, 1 user, load average: 43.63, 43.70, 20.15
total used free shared buffers cached
Mem: 31720 24797 6923 0 76 19320
(…)
05:05:00 up 31 days, 23:48, 1 user, load average: 31.70, 40.96, 19.75
total used free shared buffers cached
Mem: 31720 24947 6773 0 76 19326
(…)
05:05:20 up 31 days, 23:48, 1 user, load average: 23.12, 38.41, 19.36
total used free shared buffers cached
Mem: 31720 25036 6684 0 76 19334
(…)
05:05:40 up 31 days, 23:48, 1 user, load average: 17.12, 36.05, 18.99
total used free shared buffers cached
Mem: 31720 25197 6523 0 76 19340
(…)
05:06:00 up 31 days, 23:49, 1 user, load average: 12.84, 33.84, 18.63
total used free shared buffers cached
Mem: 31720 25316 6404 0 76 19367
(…)
05:06:20 up 31 days, 23:49, 1 user, load average: 9.85, 31.80, 18.28
total used free shared buffers cached
Mem: 31720 24728 6992 0 76 18839
(…)
05:06:40 up 31 days, 23:49, 1 user, load average: 7.61, 29.86, 17.93
total used free shared buffers cached
Mem: 31720 24835 6885 0 76 18847
(…)
05:07:00 up 31 days, 23:50, 1 user, load average: 5.74, 27.99, 17.57
total used free shared buffers cached
Mem: 31720 24971 6749 0 76 18852
(…)
05:07:20 up 31 days, 23:50, 1 user, load average: 4.48, 26.26, 17.22
total used free shared buffers cached
Mem: 31720 25133 6587 0 76 18861
(…)
05:07:40 up 31 days, 23:50, 2 users, load average: 3.83, 24.70, 16.90
total used free shared buffers cached
Mem: 31720 25351 6369 0 76 18872
(…)
05:08:00 up 31 days, 23:51, 2 users, load average: 3.10, 23.18, 16.56
total used free shared buffers cached
Mem: 31720 25334 6385 0 76 18879
(…)
05:08:20 up 31 days, 23:51, 2 users, load average: 2.52, 21.75, 16.23
total used free shared buffers cached
Mem: 31720 25362 6358 0 77 18884
Here are the pertinent machine and OS and Postgres details:
PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
Linux ps2db 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
postgres=# SELECT name, current_setting(name), source
postgres-# FROM pg_settings
postgres-# WHERE source NOT IN ('default', 'override');
name | current_setting | source
------------------------------+-------------------------------+----------------------
application_name | psql | client
archive_command | /bin/true | configuration file
archive_mode | on | configuration file
autovacuum_analyze_threshold | 50 | configuration file
autovacuum_freeze_max_age | 800000000 | configuration file
autovacuum_naptime | 5min | configuration file
autovacuum_vacuum_threshold | 50 | configuration file
bytea_output | escape | configuration file
checkpoint_completion_target | 0.7 | configuration file
checkpoint_segments | 128 | configuration file
checkpoint_timeout | 15min | configuration file
checkpoint_warning | 30s | configuration file
client_encoding | UTF8 | client
constraint_exclusion | partition | configuration file
cpu_index_tuple_cost | 0.005 | configuration file
cpu_operator_cost | 0.0025 | configuration file
cpu_tuple_cost | 0.01 | configuration file
custom_variable_classes | pg_stat_statements | configuration file
DateStyle | ISO, MDY | configuration file
default_statistics_target | 100 | configuration file
default_text_search_config | pg_catalog.english | configuration file
effective_cache_size | 16GB | configuration file
effective_io_concurrency | 1 | configuration file
enable_material | off | configuration file
escape_string_warning | on | configuration file
hot_standby | on | configuration file
lc_messages | C | configuration file
lc_monetary | en_US.UTF-8 | configuration file
lc_numeric | en_US.UTF-8 | configuration file
lc_time | en_US.UTF-8 | configuration file
listen_addresses | * | configuration file
log_autovacuum_min_duration | 0 | configuration file
log_checkpoints | on | configuration file
log_connections | on | configuration file
log_destination | csvlog | configuration file
log_directory | pg_log | configuration file
log_disconnections | on | configuration file
log_filename | postgresql.log.ps2db.%H | configuration file
log_line_prefix | %t [%d] [%u] [%p]: [%l-1] %h | configuration file
log_lock_waits | on | configuration file
log_min_duration_statement | 0 | configuration file
log_rotation_age | 1h | configuration file
log_temp_files | 0 | configuration file
log_timezone | Canada/Pacific | environment variable
log_truncate_on_rotation | on | configuration file
logging_collector | on | configuration file
maintenance_work_mem | 1GB | configuration file
max_connections | 500 | configuration file
max_locks_per_transaction | 512 | configuration file
max_stack_depth | 2MB | environment variable
max_standby_streaming_delay | 90min | configuration file
max_wal_senders | 6 | configuration file
pg_stat_statements.max | 10000 | configuration file
pg_stat_statements.track | all | configuration file
port | 5432 | configuration file
random_page_cost | 4 | configuration file
shared_buffers | 6GB | configuration file
shared_preload_libraries | pg_stat_statements | configuration file
standard_conforming_strings | off | configuration file
stats_temp_directory | /ram_postgres_stats | configuration file
temp_buffers | 16MB | configuration file
TimeZone | Canada/Pacific | environment variable
wal_keep_segments | 64 | configuration file
wal_level | hot_standby | configuration file
work_mem | 8MB | configuration file
(65 rows)
RAM: 32 gigs
CPU: 24 cores; Intel(R) Xeon(R) CPU X7460 @ 2.66GHz
RAID 10