Postgresql tunning-- help needed

Ashutosh Durugkar <dba.ashutosh@xxxxxxxxx> · Fri, 21 Feb 2014 17:52:08 +0530

Hello Postgresql experts,

We are facing issues with our PostgreSQL databases running on Ubuntu server, right after we shifted our databases from OpenSuse O/S.

It's a new database servers runs fine for most of the time (Avg. Load 0.5 to 1.0) but suddenly spikes once/twice a day.This happens four times in last three day and during this, simple update/select statements started taking minutes (1 to 5 Minutes) instead of 5-50 mSec.

And this max out database 250 connections. This event halt all processes for about 15- 20 min and then everything back to normal. I verified checkpoint and vacuum related activities but this isn't showing any problem to me. (attached logs)

Top/vmstat output shows all resources were suddenly utilized by %us during same time. iostat doesn't shows any IO related bottleneck. I have added completed logs for yesterday outage (13:45 to 14:15) .

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
44  0      0 201929344 345260 50775772    0    0     2    15    2    2  2  0 98  0

40  0      0 201919264 345260 50775864    0    0     0   224 9409 1663 98  1  1  0
40  0      0 201915344 345260 50775880    0    0     0   280 8803 1674 99  0  0  0
38  0      0 201911296 345260 50775888    0    0     0   156 8753 1469 99  0  0  0

40  0      0 201902416 345260 50775888    0    0     0   224 9060 2775 98  1  1  0

Free -m
  total       used       free     shared    buffers     cached
Mem:           251         59        192          0          0         48

-/+ buffers/cache:         10        241
Swap:           29          0         29

System information.

Connections into our databases are coming from WebServer (running on PHP and Apache) and script servers (PHP).We have verified apache logs and we didn't find connection traffic during same interval.

Hardware information:

DELL PowerEdge R715 
Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
Ubuntu 12.04.4 LTS
kernel: 3.8.0-35-generic 64 bit
Postgresql version: 9.0.13
RAM: 256 GB 32 Cores CPU

ps_xlog : RAID 1
data folder : RAID10 (6 Strips)
read:write ratio- 85:15

Pgbouncer configured on database side(250 allowed connections)

Postgresql Configuration:
Database Size: 28GB

Vaccum analyzed daily
checkpoint_completion_target = 0.9
maintenance_work_mem = 16GB
shared_buffers = 8GB # we reduced this from 32 GB.
max_connections = 300 
checkpoint_segments = 32
checkpoint_timeout = 8min

detailed postgresql configuration: http://pastie.org/8754957
checkpoint/vacuum information http://pastie.org/8754954
Top command o/p: http://pastie.org/8755007

iostat o/p: http://pastie.org/8755009
sysctl.configuration : http://pastie.org/8755197

We have recently upgraded O/S kernels to fix this issue but this it didn't help. We are tried to modify some O/S parameters based on some discussions-

http://www.postgresql.org/message-id/50E4AAB1.9040902@xxxxxxxxxxxxxxxx

vm.dirty_background_bytes = 33554432 # I reduced this based on some forums.

vm.dirty_bytes = 536870912
vm.overcommit_memory=2
kernel.sched_migration_cost = 5000000 
kernel.sched_autogroup_enabled = 0 

We believe that our PostgreSQL configuration is not correct according to available memory on machine and need some urgent tuning into it.

Could you please guide me on troubleshooting this issue.

Thanks in advance.

Ashutosh.D
PSI.