On Fri, Nov 03, 2017 at 01:43:32AM +0000, tao tony wrote: > I had an asynchronous steaming replication HA cluster.Each node had 64G memory.pg is 9.6.2 and deployed on centos 6. > > Last month the database was killed by OS kernel for OOM,the checkpoint process was killed. If you still have logs, was it killed during a large query? Perhaps one using a hash aggregate? > I noticed checkpoint process occupied memory for more than 20GB,and it was growing everyday.In the hot-standby node,the recovering process occupied memory as big as checkpoint process. "resident" RAM of a postgres subprocess is often just be the fraction of shared_buffers it's read/written. checkpointer must necessarily read all dirty pages from s-b and write out to disk (by way of page cache), so that's why its RSS is nearly 32GB. And the recovery process is continuously writing into s-b. > Now In the standby node,checkpoint and recovering process used more then 50GB memory as below,and I worried someday the cluster would be killed by OS again. > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 167158 postgres 20 0 34.9g 25g 25g S 0.0 40.4 46:36.86 postgres: startup process recovering 00000004000008550000004B > 167162 postgres 20 0 34.9g 25g 25g S 0.0 40.2 17:58.38 postgres: checkpointer process > > shared_buffers = 32GB Also, what is work_mem ? Justin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general