Shiva Raman wrote:
Hi
Today the load observed very high load . I am pasting the top.
_*TOP *_
top - 12:45:23 up 79 days, 14:42, 1 user, load average: 45.84, 33.13,
25.84
Tasks: 394 total, 48 running, 346 sleeping, 0 stopped, 0 zombie
Cpu(s): 49.2%us, 0.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si,
50.0%st
Mem: 16133676k total, 14870736k used, 1262940k free, 475484k buffers
Swap: 14466492k total, 124k used, 14466368k free, 11423616k cached
_*OUTPUT OF IOSTAT 1 5 (is SAN becoming a bottleneck,shows 50% CPU
usage?) *_
clusternode2:~ # iostat 1 5
Linux 2.6.16.46-0.12-ppc64 (clusternode2) 09/24/2009 _ppc64_
(4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
16.00 0.00 0.68 0.61 10.72 71.99
avg-cpu: %user %nice %system %iowait %steal %idle
29.46 0.00 0.25 0.00 7.43 62.87
avg-cpu: %user %nice %system %iowait %steal %idle
46.17 0.00 0.99 0.00 38.52 14.32
avg-cpu: %user %nice %system %iowait %steal %idle
48.88 0.00 0.99 0.00 49.88 0.25
avg-cpu: %user %nice %system %iowait %steal %idle
47.86 0.00 2.14 0.00 50.00 0.00
Both top and iostat show no wait time for io. However, they both show
wait time on the vm. You have 50% user and 50% steal, and zero% io.
you said: "SAN becoming a bottleneck,shows 50% CPU usage?"
I'm not sure what you are looking at. SAN is like HD right? I assume
waiting on the SAN would show up as %iowait... yes?
Andy Colson Wrote : ,
/Eww. I think that's bad. A connection that has a transaction open
will cause lots of row versions, which use up ram, and make it slower to
step through the table (even with an index). You really need to fix up
your code and make sure you commit transactions. (any statement
(select, insert, update) will start a new transaction that you need to
explicitly commit).
/With reference to this suggestion by Andy Colson, we checked the
application code and found that onlyINSERT, UPDATE has COMMIT and
SELECT has no commit, We are using a lot of "Ajax Suggest" in the all
the forms accessed for fetching the data using SELECT statements which
are not explicitly commited. We have started updating the code on this.
Thanks for this suggestion.
Again thanks to suggestion of Scott Marlowe in reducing the number of
connections. This was now reducted to 500 .
As i mentioned in the mail, i am restarting the database every 30
minutes. I found a shell script in the wiki which could the idle in
transaction pids. This is the code. The code will kill all old pids in
the server.
This is the script
/usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | \
/usr/bin/wc -l ` -gt 20 && /usr/bin/pkill -o -f 'idle in transaction'
and this is the link where the script was provided.
http://wiki.dspace.org/index.php/Idle_In_Transaction_Problem
I tried it run it as test in the server, but the script is not
executing. Even i see many of the "Idle in transaction " PIDs are
showing R (RUnning status) , but most of them are showing S(Sleep )
status. Please suggest anyway i can resolve this idle transaction issue.
fixing up the code to commit selects will make the "idle in trans." go
away. I'm with Praveen, fix the code, avoid the scripts.
Is there anything else running on this box? You said previously "The
PowerPC cpu is having some virtual layer that is shown in the Steal
value.". I'm not sure what that means. Are you in a virtual machine?
Or running other vm's? Based on the top you posted (this one and the
very first one) you are loosing half your cpu to the vm. (unless I'm
totally reading this wrong... I don't have experience with vm's so
please someone jump in here and correct me if I'm wrong)
-Andy
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance