Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)

Rajesh Kumar Mallah <mallah.rajesh@xxxxxxxxx> · Sat, 26 Jun 2010 15:23:43 +0530

Dear List,

Today  has been good since morning. Although it is a lean day
for us but the indications are nice. I thank everyone who shared
the concern. I think the most significant change has been to reduce

shared_buffers from 10G to 4G , this has lead to reduced memory 
usage and some breathing space to the OS.

Although i am yet to incorporate the suggestions from pgtune but 
i think the issue of max_connection needs to be addressed first.

I am investigating application issues and about the mechanism that
puts many backend to '<IDLE> in transaction ' mode for significant
times. I thank Tom for the script he sent. Once that resolves i shall

check pooling as suggested by Kevin, then eventually max_connections 
can be reduced. I shall also check pgpool and pgbouncer if they are
helpful in this regard.

I observed that the number of simultaneous connection today (lean day)

hovers between 1 to 10 , occasionally shooting to 15 but never more than 
20 i would say.

I am happy that i/o waits are negligible and cpu is idling also for a while.

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------

 r  b   swpd   free   buff  cache     si   so    bi    bo   in   cs   us sy id wa st
22  0  18468 954120  13460 28491772    0    0   568  1558 13645 18355 62 10 27  2  0

16  0  18468 208100  13476 28469084    0    0   580   671 14039 17055 67 13 19  1  0
10  2  18812 329032  13400 28356972    0   46   301  1768 13848 17884 68 10 20  1  0

16  2  18812 366596  13416 28361620    0    0   325   535 13957 16649 72 11 16  1  0
50  1  18812 657048  13432 28366548    0    0   416   937 13823 16667 62  9 28  1  0

 6  1  18812 361040  13452 28371908    0    0   323   522 14352 16789 74 12 14  0  0
33  0  18812 162760  12604 28210152    0    0   664  1544 14701 16315 66 11 22  2  0

 5  0  18812 212028  10764 27921800    0    0   552   648 14567 17737 67 10 21  1  0
 6  0  18796 279920  10548 27890388    3    0   359   562 12635 15976 60  9 30  1  0

 8  0  18796 438820  10564 27894440    0    0   289  2144 12234 15770 57  8 34  1  0
 5  0  18796 531800  10580 27901700    0    0   514   394 12169 16005 59  8 32  1  0

17  0  18796 645868  10596 27890704    0    0   423   948 13369 16554 67 10 23  1  0
 9  1  18796 1076540  10612 27898604   0    0   598   403 12703 17363 71 10 18  1  0

 8  0  18796 1666508  10628 27904748   0    0   430  1123 13314 17421 57  9 32  1  0
 9  1  18776 1541444  10644 27913092   1    0   653   954 13194 16822 75 11 12  1  0

 8  0  18776 1526728  10660 27921380   0    0   692   788 13073 16987 74  9 15  1  0
 8  0  18776 1482304  10676 27933176   0    0   966  2029 13017 16651 76 12 11  1  0

21  0  18776 1683260  10700 27937492   0    0   298   663 13110 15796 67 10 23  1  0
18  0  18776 2087664  10716 27943512   0    0   406   622 12399 17072 62  9 28  1  0

With 300 connections, I think that either of these could lead you to

experience intermittent bursts of extreme swapping.  I'd drop it to

somewhere in the 16MB to 32MB range until I had a connection pool

configured such that it was actually keeping the number of active

connections much lower.

> (*) wal_buffers = 8MB # pgtune wizard 2010-06-25

> (64kb , via default)

Sure, I'd boost this.

> checkpoint_segments = 16 # pgtune wizard 2010-06-25

> (30 , specified)

If you have the disk space for the 30 segments, I wouldn't reduce

it.

> shared_buffers = 7680MB # pgtune wizard 2010-06-25

> (4096 MB , specified)

This one is perhaps the most sensitive to workload.  Anywhere

between 1GB and 8GB might be best for you.  Greg Smith has some

great advice on how to tune this for your workload.

> (*) max_connections = 80 # pgtune wizard 2010-06-25

> (300 , ;-) specified)

>

> when i reduce max_connections i start getting errors, i will see

> again concurrent connections during business hours.

That's probably a good number to get to, but you have to reduce the

number of actual connections before you set the limit that low.

> lot of our connections are in <IDLE> in transaction state

If any of these stay in that state for more than a minute or two,

you need to address that if you want to get your connection count

under control.  If any of them persist for hours or days, you need

to fix it to avoid bloat which can kill performance.

-Kevin