A simple (perl) script was made to 'watch' the state transitions of
back ends. On startup It captures a set of pids for watching
and displays a visual representation of the states for next 30 intervals
of 1 seconds each. The X axis is interval cnt, Y axis is pid and the
origin is on top-left.
The state value can be Active Query (*) , or <IDLE> indicated by '.' or
'<IDLE> in transaction' indicated by '?' . for my server below is a random
output (during lean hours and on a lean day).
----------------------------------------------------------------------------------------------------
PID 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
----------------------------------------------------------------------------------------------------
4334 ? ? ? ? * ? ? ? ? ? * ? ? ? ? ? ? ? ?
6904 ? ? . . . * ? . . . . . . ? ? .
6951 ? ? ? . . . . ? ? ? ? ? . . . ? ? ? . . . ? . . . . . ? ? .
7009 ? * ? ? . . . . . . . . . * * . * ? ? . . . * ? ? ? . . . ?
7077 ? . ? . . . * . ? . . . . ? . . . . . . . ? . . ? . . . ? ?
7088 ? . . ? . ? ? ? . . . . . . ? . . ? ? * . . . . . ? . ? . *
7091 ? . . * ? ? ? ? ? ? ? * ? . . ? * . * . . . . . . . . . . .
7093 ? ? . ? . . . . ? . ? * . . . . . . . . . ? ? ? . ? ? ? . .
7112 * * . . . ? ? ? . . . . . . . . ? ? . ? . ? . . ? . . . . .
7135 ? . . * . ? ? ? . ? ? . . . ? . . . . . . . ? . . . ? ? . .
7142 ? . ? . . . . . . * . . . ? . . . . . . . . . . . . . .
7166 ? . ? ? ? * * . ? * . ? . . . ? . ? ? . . . * . . . ? . . .
8202 ? ? . . . * . ? . . . . . . . * ? . . . ? ? . . . . ? ? ? .
8223 ? . . . . . . ?
8237 ? ? ? . ? ? ? ? . . . . . . . ? . . . . . ? . . * ? . . . .
8251 ? . ? . . . . . ? ? . . . * ? . . . ? . . . . . . . . . . .
8278 ? ? . . . . ? . . . . . . . ? . . . . . . ? ? . . * . . . .
8290 ? . .
8294 ? ? . . . . . . . . . . . . ? . . . ? ? . . . . . . . . * *
8303 ? * ? . ? ? ? . ? ? ? . . . . * . . . . . . . . . . . . . .
8306 ? ? . . . ? . . . ? . . . . . . * . . .
8309 * ? ? ? ? . . . ? . . .
8329 ? . * * . . . . . . . * . ? . * . ? . * . * ? . . .
----------------------------------------------------------------------------------------------------
(*) Active Query , (.) Idle , (?) Idle in transaction,<blank> backend over.
----------------------------------------------------------------------------------------------------
Looks like most of the graph space is filled with (.) or (?) and very
less active queries (long running queries > 1s). on a busy day and busi hour i shall check the and post again. The script is presented which depends only on perl , DBI and DBD::Pg.
script pasted here:
http://pastebin.com/mrjSZfLB
Regds
mallah.
On Sat, Jun 26, 2010 at 3:23 PM, Rajesh Kumar Mallah <mallah.rajesh@xxxxxxxxx> wrote:
Dear List,
Today has been good since morning. Although it is a lean day
for us but the indications are nice. I thank everyone who shared
the concern. I think the most significant change has been to reduce
shared_buffers from 10G to 4G , this has lead to reduced memory
usage and some breathing space to the OS.
Although i am yet to incorporate the suggestions from pgtune but
i think the issue of max_connection needs to be addressed first.
I am investigating application issues and about the mechanism that
puts many backend to '<IDLE> in transaction ' mode for significant
times. I thank Tom for the script he sent. Once that resolves i shall
check pooling as suggested by Kevin, then eventually max_connections
can be reduced. I shall also check pgpool and pgbouncer if they are
helpful in this regard.
I observed that the number of simultaneous connection today (lean day)
hovers between 1 to 10 , occasionally shooting to 15 but never more than
20 i would say.
I am happy that i/o waits are negligible and cpu is idling also for a while.22 0 18468 954120 13460 28491772 0 0 568 1558 13645 18355 62 10 27 2 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
16 0 18468 208100 13476 28469084 0 0 580 671 14039 17055 67 13 19 1 0
10 2 18812 329032 13400 28356972 0 46 301 1768 13848 17884 68 10 20 1 0
16 2 18812 366596 13416 28361620 0 0 325 535 13957 16649 72 11 16 1 0
50 1 18812 657048 13432 28366548 0 0 416 937 13823 16667 62 9 28 1 0
6 1 18812 361040 13452 28371908 0 0 323 522 14352 16789 74 12 14 0 0
33 0 18812 162760 12604 28210152 0 0 664 1544 14701 16315 66 11 22 2 0
5 0 18812 212028 10764 27921800 0 0 552 648 14567 17737 67 10 21 1 0
6 0 18796 279920 10548 27890388 3 0 359 562 12635 15976 60 9 30 1 0
8 0 18796 438820 10564 27894440 0 0 289 2144 12234 15770 57 8 34 1 0
5 0 18796 531800 10580 27901700 0 0 514 394 12169 16005 59 8 32 1 0
17 0 18796 645868 10596 27890704 0 0 423 948 13369 16554 67 10 23 1 0
9 1 18796 1076540 10612 27898604 0 0 598 403 12703 17363 71 10 18 1 0
8 0 18796 1666508 10628 27904748 0 0 430 1123 13314 17421 57 9 32 1 0
9 1 18776 1541444 10644 27913092 1 0 653 954 13194 16822 75 11 12 1 0
8 0 18776 1526728 10660 27921380 0 0 692 788 13073 16987 74 9 15 1 0
8 0 18776 1482304 10676 27933176 0 0 966 2029 13017 16651 76 12 11 1 0
21 0 18776 1683260 10700 27937492 0 0 298 663 13110 15796 67 10 23 1 0
18 0 18776 2087664 10716 27943512 0 0 406 622 12399 17072 62 9 28 1 0
With 300 connections, I think that either of these could lead you to
experience intermittent bursts of extreme swapping. I'd drop it to
somewhere in the 16MB to 32MB range until I had a connection pool
configured such that it was actually keeping the number of active
connections much lower.
Sure, I'd boost this.
> (*) wal_buffers = 8MB # pgtune wizard 2010-06-25
> (64kb , via default)
If you have the disk space for the 30 segments, I wouldn't reduce
> checkpoint_segments = 16 # pgtune wizard 2010-06-25
> (30 , specified)
it.
This one is perhaps the most sensitive to workload. Anywhere
> shared_buffers = 7680MB # pgtune wizard 2010-06-25
> (4096 MB , specified)
between 1GB and 8GB might be best for you. Greg Smith has some
great advice on how to tune this for your workload.
That's probably a good number to get to, but you have to reduce the
> (*) max_connections = 80 # pgtune wizard 2010-06-25
> (300 , ;-) specified)
>
> when i reduce max_connections i start getting errors, i will see
> again concurrent connections during business hours.
number of actual connections before you set the limit that low.
If any of these stay in that state for more than a minute or two,
> lot of our connections are in <IDLE> in transaction state
you need to address that if you want to get your connection count
under control. If any of them persist for hours or days, you need
to fix it to avoid bloat which can kill performance.
-Kevin