ERROR: out of memory

Dzmitry Nikitsin <dzmitry.nikitsin@xxxxxxxxx> · Thu, 02 Apr 2015 20:24:26 -0400

Hey folks, 
 I have 4 postgresql servers 9.3.6(on master I use 9.3.5) configured with  streaming replication - with 1 maser(30GB RAM, processor - Intel Xeon E5-2680 v2) and 3 slaves(61 Intel Xeon E5-2670 v2), all on Ubuntu 14.04.1 LTS, 

Master configuration: 
default_statistics_target = 50
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
work_mem = 120MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7GB
max_connections = 300

Slave configuration: 

max_connections = 300
shared_buffers = 10GB
effective_cache_size = 45GB
work_mem = 19660kB
maintenance_work_mem = 2GB
checkpoint_segments = 32
checkpoint_completion_target = 0.7
wal_buffers = 16MB

default_statistics_target = 100

I am using XSF file system, size of my database – 168GB. 
For linux kernel I have settings:
vm.swappiness = 0
vm.overcommit_memory = 2

vm.overcommit_ratio = 50
kernel.shmall = 7864320

kernel.shmmax = 32212254720
kernel.shmmni = 4096

Master is primary to write data. Slave – for reporting. In reality I am using not more then 100 connections to slave server at the same time. Writing about 3000 records in a minute. 

I have one table where I writing statistics, that is portioned by month. Below is table size with biggest relations 
public.stats_201408               | 9212 MB
 public.stats_201503               | 8868 MB
 pg_toast.pg_toast_6404464             | 8319 MB
 pg_toast.pg_toast_317921              | 7520 MB
 public.stats_201409               | 7101 MB
 public.stats_201412               | 4458 MB

I see here pg_toast, from doc I read it’s large objects there, but It’s not related to my table stats, which is read/write heavy(type of the biggest column in this table => character varying(3000)). I.e. - it’s related to different table. 

My application create 100 connections & keeping them during whole life cycle(usually until next deploy – that may happen in couple days), with time – connection growing in memory(checking using htop) & free memory going down. As result with time(usually 3-4 hours) my DB start throwing 

ERROR:  out of memory

DETAIL:  Failed on request of size 2048.

After I restart my application(reconnect to DB), it start working fine again. 

It even fail on simple query like: 

 SELECT COUNT(*) FROM “stats" WHERE "stats”."bt_id" = $1 AND (stats.created_at >= '2015-04-02 04:00:00.000000') AND (stats.created_at <= '2015-04-03 03:59:59.999999') AND ("stats"."source" IS NOT NULL) AND "stats"."device" IN (1, 2) AND "stats”."ra" = 0 AND "stats"."paid" = ’t'

Any help appreciated.