I am working in a public company who uses only open source applications and databases.
I have a problem with our critical database which is write and read intensive.
version: Postgresql-9.4
Hardware: HP DL980 (8-processor, 80 cores w/o hyper threading, 512GB RAM)
Operating system: Red Hat Enterprise Linux Server release 6.4 (Santiago)
uname -a : Linux host1 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Single database with separate tablespace for main-data, pg_xlog and indexes
I have a database having 770GB size and expected to grow to 2TB within the next year.
The database was running in a 2processor HP DL560 (16 cores) and as the transactions of the database were found increasing, we have changed the hardware to DL980 with 8 processors and 512GB RAM.
Problem
It is observed that at some times during moderate load the CPU usage goes up to 400% and the users are not able to complete the queries in expected time. But the load is contributed by some system process only.
The average connections are normally 50. But when this happens the connections will shoot up to max-connections.
The sar command output
07:20:01 IST CPU %user %nice %system %iowait %steal %idle
07:30:01 IST all 0.73 0.00 0.37 0.58 0.00 98.33
07:40:01 IST all 0.66 0.00 0.38 0.65 0.00 98.31
07:50:01 IST all 0.27 0.00 0.27 0.01 0.00 99.45
08:00:01 IST all 0.52 0.00 0.37 0.01 0.00 99.10
08:10:01 IST all 1.54 0.00 0.70 0.02 0.00 97.74
08:20:01 IST all 1.20 0.00 0.67 0.02 0.00 98.10
08:30:01 IST all 1.48 0.00 0.77 0.03 0.00 97.72
08:40:01 IST all 1.69 0.00 0.89 0.04 0.00 97.39
08:50:01 IST all 1.71 0.00 0.94 0.04 0.00 97.31
09:00:01 IST all 1.74 0.00 0.92 0.03 0.00 97.31
09:10:01 IST all 2.32 0.00 1.06 0.04 0.00 96.58
09:20:01 IST all 2.22 0.00 1.17 0.04 0.00 96.57
09:30:02 IST all 2.20 0.00 6.68 0.06 0.00 91.06
09:40:01 IST all 2.43 0.00 1.37 0.06 0.00 96.14
09:50:01 IST all 3.23 0.00 2.06 0.08 0.00 94.63
10:00:02 IST all 3.15 0.00 6.10 0.07 0.00 90.67
10:10:01 IST all 4.94 0.00 5.20 0.29 0.00 89.57
10:20:01 IST all 5.10 0.00 2.13 0.34 0.00 92.43
10:30:01 IST all 5.60 0.00 2.42 0.18 0.00 91.80
10:40:01 IST all 5.28 0.00 14.37 0.19 0.00 80.16
10:50:01 IST all 4.52 0.00 28.48 0.23 0.00 66.77
11:00:01 IST all 5.25 0.00 9.02 0.18 0.00 85.55
11:10:01 IST all 5.77 0.00 4.96 0.27 0.00 89.00
11:20:01 IST all 5.70 0.00 2.74 0.19 0.00 91.37
11:30:01 IST all 5.72 0.00 5.91 0.20 0.00 88.17
11:40:01 IST all 5.66 0.00 2.81 0.37 0.00 91.15
11:50:01 IST all 5.90 0.00 8.80 0.10 0.00 85.19
12:00:01 IST all 6.44 0.00 3.40 0.13 0.00 90.03
12:10:01 IST all 7.18 0.00 4.52 0.11 0.00 88.18
12:20:02 IST all 4.40 0.00 37.84 0.07 0.00 57.70
12:30:01 IST all 5.66 0.00 2.98 0.10 0.00 91.26
12:40:01 IST all 5.74 0.00 3.05 0.11 0.00 91.10
Average: all 1.92 0.00 2.28 0.11 0.00 95.69
Postgresql.conf
max_connections = 500 (can be reduced)
shared_buffers = 8500MB
work_mem = 50MB
maintenance_work_mem = 8064MB
checkpoint_segments = 132
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9
This over load happens 5-6 times a day.
How to trace the cause of this problem?.
My thoughts.
1. some thing related to the numa systems memory management.
2. Some thing related to the size of shared buffers.
Please help
Ajayakumar.BS
View this message in context: Multi processor server overloads occationally with system process while running postgresql-9.4
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.