Re: High CPU load on Postgres Server during Peak times!!!!

Shiva Raman <raman.shivag@xxxxxxxxx> · Thu, 24 Sep 2009 18:20:29 +0530

Hi 

Today the load observed very high load . I am pasting the top. 

TOP 
top - 12:45:23 up 79 days, 14:42,  1 user,  load average: 45.84, 33.13, 25.84
Tasks: 394 total,  48 running, 346 sleeping,   0 stopped,   0 zombie

Cpu(s): 49.2%us,  0.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.1%si, 50.0%st
Mem:  16133676k total, 14870736k used,  1262940k free,   475484k buffers
Swap: 14466492k total,      124k used, 14466368k free, 11423616k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                           
 4152 postgres  17   0 2436m 176m 171m R   16  1.1   0:03.09 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4122 postgres  17   0 2431m  20m  17m R   12  0.1   0:06.38 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4007 postgres  16   0 2434m  80m  75m R   11  0.5   0:26.46 postgres: postgres dbEnterpriser_09_10 192.168.10.

 3994 postgres  16   0 2432m 134m 132m R   10  0.9   0:43.40 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4166 postgres  16   0 2433m  12m 8896 R    9  0.1   0:02.71 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4110 postgres  15   0 2436m 224m 217m S    8  1.4   0:06.83 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4061 postgres  16   0 2446m 491m 473m R    8  3.1   0:17.32 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4113 postgres  16   0 2432m  68m  65m R    8  0.4   0:11.03 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4071 postgres  16   0 2435m 200m 194m R    7  1.3   0:13.69 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4169 postgres  15   0 2436m 122m 117m R    7  0.8   0:00.93 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4178 postgres  16   0 2432m  77m  75m R    7  0.5   0:00.56 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4108 postgres  16   0 2437m 301m 293m R    6  1.9   0:11.94 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4155 postgres  16   0 2438m 252m 244m S    5  1.6   0:02.80 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4190 postgres  15   0 2432m  10m 8432 R    5  0.1   0:00.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
 3906 postgres  16   0 2433m 124m 119m R    5  0.8   0:57.28 postgres: postgres dbEnterpriser_09_10 192.168.10.

 3970 postgres  16   0 2442m 314m 304m R    5  2.0   0:16.43 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4130 postgres  17   0 2433m  76m  72m R    5  0.5   0:03.76 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4179 postgres  16   0 2432m 105m 102m R    5  0.7   0:01.11 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4125 postgres  17   0 2436m 398m 391m R    4  2.5   0:05.62 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4162 postgres  16   0 2432m 125m 122m R    4  0.8   0:01.01 postgres: postgres dbEnterpriser_09_10 192.168.10.
 217m S    8  1.4   0:06.83 postgres: postgres dbEnterpriser_09_10 192.168.10. dbEnterpriser_09_10 192.168.10.

 4061 postgres  16   0 2446m 491m 473m R    8  3.1   0:17.32 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4113 postgres  16   0 2432m  68m  65m R    8  0.4   0:11.03 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4071 postgres  16   0 2435m 200m 194m R    7  1.3   0:13.69 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4169 postgres  15   0 2436m 122m 117m R    7  0.8   0:00.93 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4178 postgres  16   0 2432m  77m  75m R    7  0.5   0:00.56 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4108 postgres  16   0 2437m 301m 293m R    6  1.9   0:11.94 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4155 postgres  16   0 2438m 252m 244m S    5  1.6   0:02.80 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4190 postgres  15   0 2432m  10m 8432 R    5  0.1   0:00.71 postgres: postgres dbEnterpriser_09_10 192.168.10.

 3906 postgres  16   0 2433m 124m 119m R    5  0.8   0:57.28 postgres: postgres dbEnterpriser_09_10 192.168.10.
 3970 postgres  16   0 2442m 314m 304m R    5  2.0   0:16.43 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4130 postgres  17   0 2433m  76m  72m R    5  0.5   0:03.76 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4179 postgres  16   0 2432m 105m 102m R    5  0.7   0:01.11 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4125 postgres  17   0 2436m 398m 391m R    4  2.5   0:05.62 postgres: postgres dbEnterpriser_09_10 192.168.10.
 4162 postgres  16   0 2432m 125m 122m R    4  0.8   0:01.01 postgres: postgres dbEnterpriser_09_10 192.168.10.

 4185 postgres  1

OUTPUT OF IOSTAT 1 5 (is SAN becoming a bottleneck,shows 50% CPU usage?) 

clusternode2:~ # iostat 1 5
Linux 2.6.16.46-0.12-ppc64 (clusternode2)       09/24/2009      _ppc64_ (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          16.00    0.00    0.68    0.61   10.72   71.99

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.08         1.96        22.54   13505448  155494808

sdb               0.00         0.20         0.45    1410179    3099920
sdc               0.00         0.05         0.01     357404      78840
scd0              0.00         0.00         0.00        136          0

sdd              12.20        77.69       343.49  535925176 2369551848
sde               0.00         0.00         0.00       1120          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          29.46    0.00    0.25    0.00    7.43   62.87

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0

sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0

sde               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          46.17    0.00    0.99    0.00   38.52   14.32

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0

scd0              0.00         0.00         0.00          0          0
sdd               3.96         0.00       118.81          0        120
sde               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          48.88    0.00    0.99    0.00   49.88    0.25

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0

sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          47.86    0.00    2.14    0.00   50.00    0.00

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0

sdc               0.00         0.00         0.00          0          0
scd0              0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0

sde               0.00         0.00         0.00          0          0

Andy Colson Wrote :  , 
Eww.  I think that's bad.  A connection that has a transaction open
will cause lots of row versions, which use up ram, and make it slower
to step through the table (even with an index).  You really need to fix
up your code and make sure you commit transactions.  (any statement
(select, insert, update) will start a new transaction that you need to
explicitly commit).

With reference to this suggestion by Andy Colson, we checked the application code and found that onlyINSERT, UPDATE  has COMMIT  and SELECT has no commit, We are using a lot of  "Ajax Suggest" in the all the forms accessed for fetching the data using SELECT statements which are not explicitly commited. We have started updating the code on this. 

Thanks for this  suggestion. 

Again thanks to suggestion of Scott Marlowe in reducing the number of connections. This was now reducted to 500 .

As i mentioned in the mail, i am restarting the database every 30 minutes. I found a shell script in the wiki which could the idle in transaction pids. This is the code. The code will kill all old pids in the server.

This is the script 
/usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \

  /usr/bin/wc -l ` -gt 20 && /usr/bin/pkill -o -f 'idle in transaction'
and this is the link where the script was provided.

http://wiki.dspace.org/index.php/Idle_In_Transaction_Problem

I tried it run it as test in the server, but the script is not
executing. Even i see many of the "Idle in transaction " PIDs are
showing R (RUnning status) , but most of them are showing S(Sleep )
status. Please suggest anyway i can resolve this idle transaction issue. 

Regards

Shiva Raman