I am hoping someone can help me out. I have a production server running CentOS 6.8 and PostgreSQL community edition 9.4.6. The application hitting the database is highly concurrent. When the database was running on the latest 6.7 CentOS OS I had no problems, we were seeing great throughput and my bottleneck was IO as I expected it to be. From the sar data my CPU overhead was ~10% and my disk usage was 99%. Memory was not using swap. After the SA upgraded the machine to 6.8 the profile changed. Now I am seeing a high CPU bound load on the system caused by postgres. disk usage is at ~50% and our application throughput has dropped dynamically. I was able to pull some vmstat and perf metrics which can be seen below. Any insight someone can offer would be great to return the system performance. Unfortunately rolling back to 6.7 is not an option because of security policy.
OS Info:
CentOS 6.8 Linux 2.6.32-642.1.1.el6.x86_64 #1 SMP Tue May 31 21:57:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
DB Info:
PostgreSQL 9.4.6
HW Info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Model name: Intel(R) Xeon(R) CPU X5687 @ 3.60GHz
Stepping: 2
CPU MHz: 3600.000
BogoMIPS: 7199.83
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15
total used free shared buffers cached
Mem: 99052244 98472232 580012 8620496 47452 95264812
-/+ buffers/cache: 3159968 95892276
Swap: 4194300 11060 4183240
Top information
=================
top - 10:22:07 up 1 day, 1:13, 3 users, load average: 15.64, 17.16, 17.39
Tasks: 546 total, 18 running, 528 sleeping, 0 stopped, 0 zombie
Cpu(s): 91.7%us, 3.8%sy, 0.0%ni, 3.8%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 99052244k total, 98744272k used, 307972k free, 47208k buffers
Swap: 4194300k total, 11064k used, 4183236k free, 95472720k cached
PID PPID TIME+ %CPU %MEM PR NI S VIRT SWAP RES UID COMMAND
21968 8814 995:06.03 86.6 1.1 20 0 R 8640m 272 1.1g 600 postgres: SELECT
21972 8814 994:41.35 85.9 1.1 20 0 R 8640m 272 1.1g 600 postgres: SELECT
20923 8814 994:53.56 85.0 1.8 20 0 R 8641m 296 1.7g 600 postgres: SELECT
7607 8814 16:39.25 80.1 8.0 20 0 R 8631m 268 7.6g 600 postgres: SELECT
21058 8814 956:59.78 79.1 3.1 20 0 R 8641m 296 2.9g 600 postgres: SELECT
7667 8814 17:22.53 77.1 0.9 20 0 R 8621m 288 855m 600 postgres: SELECT
20899 8814 995:15.39 76.5 4.4 20 0 R 8641m 292 4.2g 600 postgres: SELECT
7381 8814 28:59.06 75.8 0.9 20 0 R 8621m 288 865m 600 postgres: SELECT
21966 8814 995:13.87 74.5 4.4 20 0 R 8641m 272 4.2g 600 postgres: SELECT
7848 8814 10:22.82 73.2 0.9 20 0 R 8621m 288 840m 600 postgres: SELECT
7806 8814 13:50.43 71.9 0.9 20 0 S 8621m 288 849m 600 postgres: idle
7645 8814 18:08.60 70.9 0.9 20 0 R 8621m 288 852m 600 postgres: SELECT
7619 8814 19:34.71 70.3 0.9 20 0 R 8621m 288 856m 600 postgres: SELECT
7665 8814 17:40.34 69.6 0.9 20 0 S 8621m 288 855m 600 postgres: COMMIT
7843 8814 13:35.96 68.6 0.9 20 0 R 8621m 288 849m 600 postgres: SELECT
7264 8814 31:09.93 67.3 1.5 20 0 R 8633m 268 1.5g 600 postgres: SELECT
7675 8814 15:00.69 66.7 0.9 20 0 R 8621m 288 849m 600 postgres: SELECT
7574 8814 23:30.92 64.4 0.9 20 0 D 8621m 288 857m 600 postgres: SELECT
6977 8814 34:06.98 64.1 2.8 20 0 R 9106m 268 2.7g 600 postgres: SELECT
21973 8814 5:39.97 1.3 1.1 20 0 S 8640m 272 1.1g 600 postgres: idle
9212 1 13:13.12 1.0 0.0 20 0 S 44120 540 1140 600 /opt/pgbouncer/bin/pgbouncer -d /etc/pgbouncer/pgbouncer.ini
VMStat Information
=================
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
20 0 11072 749604 47000 95163200 0 0 75 230 18 26 82 0 15 2 0
15 0 11072 723244 47016 95176920 0 0 38 887 86610 33495 84 6 10 0 0
21 0 11072 708256 47024 95190704 0 0 28 3856 86047 23593 85 5 9 1 0
20 0 11072 702536 47040 95196160 0 0 1 3890 82383 10971 89 4 6 1 0
19 1 11072 696464 47048 95201504 0 0 17 522 84030 11061 90 4 5 0 0
21 0 11072 688740 47064 95206920 0 0 31 2238 84497 27806 87 5 8 0 0
perf top info
=================
Overhead Shared Object Symbol
12.83% postgres [.] heap_hot_search_buffer
6.82% postgres [.] ExecScanHashBucket
6.67% postgres [.] slot_deform_tuple
4.74% postgres [.] LWLockAcquire
3.81% postgres [.] pg_next_dst_boundary
3.43% postgres [.] texteq
2.97% postgres [.] _bt_checkkeys
2.79% postgres [.] j2date
2.76% postgres [.] LWLockRelease
2.69% postgres [.] slot_getattr
2.66% postgres [.] ExecMakeFunctionResultNoSets
2.48% postgres [.] DetermineTimeZoneOffsetInternal
2.25% postgres [.] hash_search_with_hash_value
2.04% postgres [.] tm2timestamp
2.04% postgres [.] ExecEvalParamExtern
1.99% postgres [.] dt2time
1.67% postgres [.] date2j
1.54% postgres [.] pgstat_end_function_usage
1.54% postgres [.] FunctionCall2Coll
1.52% postgres [.] ExecQual
1.48% postgres [.] timestamp2tm
1.43% postgres [.] index_fetch_heap
1.33% postgres [.] index_getnext_tid
1.22% postgres [.] _bt_next
1.18% postgres [.] ExecEvalNullTest
1.15% postgres [.] ExecScan
1.14% postgres [.] btgettuple
0.99% postgres [.] IndexNext
0.95% postgres [.] timestamp2timestamptz
0.94% postgres [.] ReleaseAndReadBuffer
0.93% postgres [.] PinBuffer
0.92% postgres [.] ExecEvalScalarVarFast
0.87% postgres [.] ExecStoreTuple
0.85% postgres [.] LockBuffer
0.82% postgres [.] heap_page_prune_opt
0.78% libc-2.12.so [.] __memcmp_sse4_1
0.74% postgres [.] toast_raw_datum_size
0.74% postgres [.] pg_detoast_datum_packed
0.72% postgres [.] ExecEvalRelabelType
0.68% postgres [.] CheckForSerializableConflictOut
0.68% postgres [.] HeapTupleSatisfiesMVCC
0.57% postgres [.] index_getnext
0.53% postgres [.] ReadBuffer_common
0.51% postgres [.] pgstat_init_function_usage
IO information
=====================
Tme DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
08:00:01 dev253-5 59.13 1.89 1564.72 26.50 2.57 43.43 5.90 34.90
08:10:01 dev253-5 58.47 2.19 1555.84 26.65 2.20 37.59 6.05 35.38
08:20:01 dev253-5 60.72 36.18 1598.49 26.92 2.34 38.53 5.92 35.94
08:30:01 dev253-5 60.21 12.68 1562.41 26.16 2.18 36.21 5.97 35.97
08:40:01 dev253-5 60.77 367.49 2624.93 49.25 2.94 48.33 6.09 36.98
08:50:01 dev253-5 63.26 217.68 2644.81 45.25 5.41 85.45 5.82 36.80
09:00:01 dev253-5 60.33 1.41 1612.09 26.75 2.16 35.80 5.93 35.77
09:10:01 dev253-5 58.45 2.96 1586.15 27.19 3.29 56.23 6.12 35.77
09:20:01 dev253-5 89.26 1139.63 3323.51 50.00 3.78 42.35 4.51 40.28
09:30:01 dev253-5 92.84 1320.94 4093.75 58.32 4.21 45.30 4.15 38.57
09:40:01 dev253-5 82.81 998.29 4179.65 62.53 5.09 61.35 4.69 38.85
09:50:01 dev253-5 58.08 22.32 5549.46 95.93 5.22 89.88 6.14 35.67
10:00:01 dev253-5 60.98 50.19 10181.89 167.78 8.34 136.79 6.04 36.81
10:10:01 dev253-5 61.11 256.26 2992.49 53.16 3.12 51.10 5.61 34.29
Average: dev253-5 83.02 4394.40 2225.11 79.73 3.18 38.35 4.69 38.93
Bradley J. Hazuka
Database Administrator
Innovate!, Inc.