Hey folks, I have 4 postgresql servers 9.3.6(on master I use 9.3.5) configured with streaming replication - with 1 maser(30GB RAM, processor - Intel Xeon E5-2680 v2) and 3 slaves(61 Intel Xeon E5-2670 v2), all on Ubuntu 14.04.1 LTS, Master configuration: default_statistics_target = 50 maintenance_work_mem = 1GB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 22GB work_mem = 120MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7GB max_connections = 300 Slave configuration: max_connections = 300 shared_buffers = 10GB effective_cache_size = 45GB work_mem = 19660kB maintenance_work_mem = 2GB checkpoint_segments = 32 checkpoint_completion_target = 0.7 wal_buffers = 16MB
default_statistics_target = 100 I am using XSF file system, size of my database – 168GB. For linux kernel I have settings: vm.swappiness = 0 vm.overcommit_memory = 2
vm.overcommit_ratio = 50 kernel.shmall = 7864320 kernel.shmmax = 32212254720 kernel.shmmni = 4096 Master is primary to write data. Slave – for reporting. In reality I am using not more then 100 connections to slave server at the same time. Writing about 3000 records in a minute. I have one table where I writing statistics, that is portioned by month. Below is table size with biggest relations public.stats_201408 | 9212 MB public.stats_201503 | 8868 MB pg_toast.pg_toast_6404464 | 8319 MB pg_toast.pg_toast_317921 | 7520 MB public.stats_201409 | 7101 MB public.stats_201412 | 4458 MB I see here pg_toast, from doc I read it’s large objects there, but It’s not related to my table stats, which is read/write heavy(type of the biggest column in this table => character varying(3000)). I.e. - it’s related to different table. My application create 100 connections & keeping them during whole life cycle(usually until next deploy – that may happen in couple days), with time – connection growing in memory(checking using htop) & free memory going down. As result with time(usually 3-4 hours) my DB start throwing ERROR: out of memory
DETAIL: Failed on request of size 2048. After I restart my application(reconnect to DB), it start working fine again. It even fail on simple query like: SELECT COUNT(*) FROM “stats" WHERE "stats”."bt_id" = $1 AND (stats.created_at >= '2015-04-02 04:00:00.000000') AND (stats.created_at <= '2015-04-03 03:59:59.999999') AND ("stats"."source" IS NOT NULL) AND "stats"."device" IN (1, 2) AND "stats”."ra" = 0 AND "stats"."paid" = ’t' Any help appreciated. |