Today the load observed very high load . I am pasting the top.
TOP
top - 12:45:23 up 79 days, 14:42, 1 user, load average: 45.84, 33.13, 25.84
Tasks: 394 total, 48 running, 346 sleeping, 0 stopped, 0 zombie
Cpu(s): 49.2%us, 0.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 50.0%st
Mem: 16133676k total, 14870736k used, 1262940k free, 475484k buffers
Swap: 14466492k total, 124k used, 14466368k free, 11423616k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4152 postgres 17 0 2436m 176m 171m R 16 1.1 0:03.09 postgres: postgres dbEnterpriser_09_10 192.168.10.
4122 postgres 17 0 2431m 20m 17m R 12 0.1 0:06.38 postgres: postgres dbEnterpriser_09_10 192.168.10.
4007 postgres 16 0 2434m 80m 75m R 11 0.5 0:26.46 postgres: postgres dbEnterpriser_09_10 192.168.10.
3994 postgres 16 0 2432m 134m 132m R 10 0.9 0:43.40 postgres: postgres dbEnterpriser_09_10 192.168.10.
4166 postgres 16 0 2433m 12m 8896 R 9 0.1 0:02.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
4110 postgres 15 0 2436m 224m 217m S 8 1.4 0:06.83 postgres: postgres dbEnterpriser_09_10 192.168.10.
4061 postgres 16 0 2446m 491m 473m R 8 3.1 0:17.32 postgres: postgres dbEnterpriser_09_10 192.168.10.
4113 postgres 16 0 2432m 68m 65m R 8 0.4 0:11.03 postgres: postgres dbEnterpriser_09_10 192.168.10.
4071 postgres 16 0 2435m 200m 194m R 7 1.3 0:13.69 postgres: postgres dbEnterpriser_09_10 192.168.10.
4169 postgres 15 0 2436m 122m 117m R 7 0.8 0:00.93 postgres: postgres dbEnterpriser_09_10 192.168.10.
4178 postgres 16 0 2432m 77m 75m R 7 0.5 0:00.56 postgres: postgres dbEnterpriser_09_10 192.168.10.
4108 postgres 16 0 2437m 301m 293m R 6 1.9 0:11.94 postgres: postgres dbEnterpriser_09_10 192.168.10.
4155 postgres 16 0 2438m 252m 244m S 5 1.6 0:02.80 postgres: postgres dbEnterpriser_09_10 192.168.10.
4190 postgres 15 0 2432m 10m 8432 R 5 0.1 0:00.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
3906 postgres 16 0 2433m 124m 119m R 5 0.8 0:57.28 postgres: postgres dbEnterpriser_09_10 192.168.10.
3970 postgres 16 0 2442m 314m 304m R 5 2.0 0:16.43 postgres: postgres dbEnterpriser_09_10 192.168.10.
4130 postgres 17 0 2433m 76m 72m R 5 0.5 0:03.76 postgres: postgres dbEnterpriser_09_10 192.168.10.
4179 postgres 16 0 2432m 105m 102m R 5 0.7 0:01.11 postgres: postgres dbEnterpriser_09_10 192.168.10.
4125 postgres 17 0 2436m 398m 391m R 4 2.5 0:05.62 postgres: postgres dbEnterpriser_09_10 192.168.10.
4162 postgres 16 0 2432m 125m 122m R 4 0.8 0:01.01 postgres: postgres dbEnterpriser_09_10 192.168.10.
217m S 8 1.4 0:06.83 postgres: postgres dbEnterpriser_09_10 192.168.10. dbEnterpriser_09_10 192.168.10.
4061 postgres 16 0 2446m 491m 473m R 8 3.1 0:17.32 postgres: postgres dbEnterpriser_09_10 192.168.10.
4113 postgres 16 0 2432m 68m 65m R 8 0.4 0:11.03 postgres: postgres dbEnterpriser_09_10 192.168.10.
4071 postgres 16 0 2435m 200m 194m R 7 1.3 0:13.69 postgres: postgres dbEnterpriser_09_10 192.168.10.
4169 postgres 15 0 2436m 122m 117m R 7 0.8 0:00.93 postgres: postgres dbEnterpriser_09_10 192.168.10.
4178 postgres 16 0 2432m 77m 75m R 7 0.5 0:00.56 postgres: postgres dbEnterpriser_09_10 192.168.10.
4108 postgres 16 0 2437m 301m 293m R 6 1.9 0:11.94 postgres: postgres dbEnterpriser_09_10 192.168.10.
4155 postgres 16 0 2438m 252m 244m S 5 1.6 0:02.80 postgres: postgres dbEnterpriser_09_10 192.168.10.
4190 postgres 15 0 2432m 10m 8432 R 5 0.1 0:00.71 postgres: postgres dbEnterpriser_09_10 192.168.10.
3906 postgres 16 0 2433m 124m 119m R 5 0.8 0:57.28 postgres: postgres dbEnterpriser_09_10 192.168.10.
3970 postgres 16 0 2442m 314m 304m R 5 2.0 0:16.43 postgres: postgres dbEnterpriser_09_10 192.168.10.
4130 postgres 17 0 2433m 76m 72m R 5 0.5 0:03.76 postgres: postgres dbEnterpriser_09_10 192.168.10.
4179 postgres 16 0 2432m 105m 102m R 5 0.7 0:01.11 postgres: postgres dbEnterpriser_09_10 192.168.10.
4125 postgres 17 0 2436m 398m 391m R 4 2.5 0:05.62 postgres: postgres dbEnterpriser_09_10 192.168.10.
4162 postgres 16 0 2432m 125m 122m R 4 0.8 0:01.01 postgres: postgres dbEnterpriser_09_10 192.168.10.
4185 postgres 1
OUTPUT OF IOSTAT 1 5 (is SAN becoming a bottleneck,shows 50% CPU usage?)
clusternode2:~ # iostat 1 5
Linux 2.6.16.46-0.12-ppc64 (clusternode2) 09/24/2009 _ppc64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
16.00 0.00 0.68 0.61 10.72 71.99
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.08 1.96 22.54 13505448 155494808
sdb 0.00 0.20 0.45 1410179 3099920
sdc 0.00 0.05 0.01 357404 78840
scd0 0.00 0.00 0.00 136 0
sdd 12.20 77.69 343.49 535925176 2369551848
sde 0.00 0.00 0.00 1120 0
avg-cpu: %user %nice %system %iowait %steal %idle
29.46 0.00 0.25 0.00 7.43 62.87
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
scd0 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
46.17 0.00 0.99 0.00 38.52 14.32
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
scd0 0.00 0.00 0.00 0 0
sdd 3.96 0.00 118.81 0 120
sde 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
48.88 0.00 0.99 0.00 49.88 0.25
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
scd0 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
47.86 0.00 2.14 0.00 50.00 0.00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
scd0 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
Andy Colson Wrote : ,
Eww. I think that's bad. A connection that has a transaction open will cause lots of row versions, which use up ram, and make it slower to step through the table (even with an index). You really need to fix up your code and make sure you commit transactions. (any statement (select, insert, update) will start a new transaction that you need to explicitly commit).
With reference to this suggestion by Andy Colson, we checked the application code and found that onlyINSERT, UPDATE has COMMIT and SELECT has no commit, We are using a lot of "Ajax Suggest" in the all the forms accessed for fetching the data using SELECT statements which are not explicitly commited. We have started updating the code on this.
Thanks for this suggestion.
Again thanks to suggestion of Scott Marlowe in reducing the number of connections. This was now reducted to 500 .
As i mentioned in the mail, i am restarting the database every 30 minutes. I found a shell script in the wiki which could the idle in transaction pids. This is the code. The code will kill all old pids in the server.
This is the script
/usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \and this is the link where the script was provided.
/usr/bin/wc -l ` -gt 20 && /usr/bin/pkill -o -f 'idle in transaction'
http://wiki.dspace.org/index.php/Idle_In_Transaction_Problem
I tried it run it as test in the server, but the script is not executing. Even i see many of the "Idle in transaction " PIDs are showing R (RUnning status) , but most of them are showing S(Sleep ) status. Please suggest anyway i can resolve this idle transaction issue.
Regards
Shiva Raman