Re: Getting out ahead of OOM

Joe Conway <mail@xxxxxxxxxxxxx> · Sun, 9 Mar 2025 16:37:20 -0400

On 3/7/25 14:26, Tom Lane wrote:
Joseph Hammerman <joe.hammerman@xxxxxxxxxxxxx> writes:
We run Postgres in a Kubernetes environment, and we have not to date been
able to convince our Compute team to create a class of Kubernetes hosts
that have memory overcommit disabled.

:-(

Has anyone had success tracking all the Postgres memory allocation
configurables and using that to administratively prevent OOMing?

I doubt anyone has tried that.  I would look into whether running
the postmaster under a suitable ulimit helps.  I seem to recall
discussions that in Linux, "ulimit -v" works better than the other
likely-looking options.  But that might be stale information.

Problem with ulimit is that it is per process, but within a Kubernetes 
pod the memory accounting is for all the pod's processes.

Alternatively, has anyone has success implementing an extension or periodic
process to monitor the memory consumption of the Postgres children and
killing them before the OOM event occurs?

That's not going to be noticeably nicer than the kernel-induced
OOM, I think.  The one thing it might do for you is ensure that
the kill happens to a child process and not the postmaster; but
you can already use PG_OOM_ADJUST_VALUE and PG_OOM_ADJUST_FILE
to manage that if it's a problem.  (Recent kernels are alleged
to usually do the right thing without that, though.)

Actually the problem here is likely that the Kubernetes Postgres pod was 
started with a memory limit. Disabling memory overcommit at the lost 
level will not help you if there is a memory limit set for the pod 
because that in turn sets memory.limit for the cgroup related to the pod 
and the oom killer will strike when memory.usage_in_bytes exceeds that 
value irrespective of the free memory at the host level. In these cases 
the oom_score_adj values don't end up mattering much.

This is a fairly complex topic -- I wrote a blog a few years ago which 
may or may not be out of date at this point:

https://www.crunchydata.com/blog/deep-postgresql-thoughts-the-linux-assassin

Additionally Jeremy Schneider wrote a more recent one that you might 
find helpful:

https://ardentperf.com/2024/09/22/kubernetes-requests-and-limits-for-postgres/

My quick and dirty recommendations:
1. Use cgroup v2 on the host if at all possible
2. Do not under any circumstances disable swap on the host. This is an
   anti-pattern unfortunately followed widely the last time I looked.
3. If nothing else, avoid setting a memory.limit on the cgroup. That
   will at least get you back to not getting whacked unless there is
   host level memory pressure. The blogs discuss how to do that with
   Kube pod settings.

HTH,

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com