Hey folks,I have 4 postgresql servers 9.3.6(on master I use 9.3.5) configured with streaming replication - with 1 maser(30GB RAM, processor - Intel Xeon E5-2680 v2) and 3 slaves(61 Intel Xeon E5-2670 v2), all on Ubuntu 14.04.1 LTS,Master configuration:default_statistics_target = 50
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
work_mem = 120MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7GB
max_connections = 300
Slave configuration:
max_connections = 300
shared_buffers = 10GB
effective_cache_size = 45GB
work_mem = 19660kB
maintenance_work_mem = 2GB
checkpoint_segments = 32
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
I am using XSF file system, size of my database – 168GB.
For linux kernel I have settings:
vm.swappiness = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
kernel.shmall = 7864320kernel.shmmax = 32212254720
kernel.shmmni = 4096
Master is primary to write data. Slave – for reporting. In reality I am using not more then 100 connections to slave server at the same time. Writing about 3000 records in a minute.
I have one table where I writing statistics, that is portioned by month. Below is table size with biggest relations
public.stats_201408 | 9212 MB
public.stats_201503 | 8868 MB
pg_toast.pg_toast_6404464 | 8319 MB
pg_toast.pg_toast_317921 | 7520 MB
public.stats_201409 | 7101 MB
public.stats_201412 | 4458 MB
I see here pg_toast, from doc I read it’s large objects there, but It’s not related to my table stats, which is read/write heavy(type of the biggest column in this table => character varying(3000)). I.e. - it’s related to different table.
My application create 100 connections & keeping them during whole life cycle(usually until next deploy – that may happen in couple days), with time – connection growing in memory(checking using htop) & free memory going down. As result with time(usually 3-4 hours) my DB start throwing
ERROR: out of memory
DETAIL: Failed on request of size 2048.
After I restart my application(reconnect to DB), it start working fine again.
It even fail on simple query like:
SELECT COUNT(*) FROM “stats" WHERE "stats”."bt_id" = $1 AND (stats.created_at >= '2015-04-02 04:00:00.000000') AND (stats.created_at <= '2015-04-03 03:59:59.999999') AND ("stats"."source" IS NOT NULL) AND "stats"."device" IN (1, 2) AND "stats”."ra" = 0 AND "stats"."paid" = ’t'
Any help appreciated.
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.