On 03/05/2016 09:12 AM, Kevin Grittner wrote:
On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@xxxxxxxxxxxxxxx> wrote:
In my log I have been getting errors that look like this:
LOG: could not fork new process for connection: Cannot allocate memory
This seems to only happen during periods of high activity.
Shortly after the last one occurred I checked the free mem and got
this:
# free -h
total used free shared buff/cache available
Mem: 9.4G 737M 283M 2.1G 8.4G 6.3G
Swap: 2.0G 8.3M 2.0G
From this I see the vast majority of the 10GB of memory is being
used by the OS cache. If I understand how it works correctly, this
means it is basically free memory because if an application needs
it the kernel will free some of it for use... is my understanding
correct?
That depends. In general, this report is light on useful detail;
you might want to review this page and fill in some of the missing
info, like OS version:
https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
Appologies for not providing more useful info. I will endeavor to do so
in the future.
The most important missing information can be captured by running
these shell statements:
cat /proc/sys/vm/overcommit_memory
cat /proc/sys/vm/overcommit_ratio
If overcommit_memory is 2, you will get "out of memory" when an
allocation would push allocated memory past swap space (2GB in your
case) plus the percentage of machine RAM indicated by
overcommit_ratio. Effectively, the difference between
overcommit_ratio and 100 is the percentage of RAM you are reserving
for the OS to keep things memory-resident (like buffers and cache).
A value of 20, for example, would mean that an allocation that
would reduce OS memory-resident pages below 80% of machine RAM will
trigger an "out of memory" error; so when you set
vm.overcommit_memory = 2 (which is generally a very good thing to
do when running PostgreSQL on the machine) you should raise
vm.overcommit_ratio = 80 or 90. You don't want to totally blow
away OS cache, but you probably want to survive other demands on
RAM that bring it down quite a bit.
Despite not having provided more information you seem to have gotten to
the crux of my problem anyway. As I was a little desperate to resolve
the problem, I ended up changing various things in an attempt to just
"make it go away" for now. I set overcommit_memory back to 0 and
shared_buffers = 500MB. I believe this has the effect of just allowing
the OS to cache things as it sees fit, and only reserving 500MB for
postgres to explicitly cache whatever it wants. But I did not have any
more OOM errors over the weekend.
Your explanation of how overcommit_ratio affects what the OS will allow
when overcommit_memory=2 is enlightening, but I am still not clear on
some specifics. For example I thought postgres allocates the shared
buffers on startup. If this is the case, why did I get OOM errors on
connections? Was that particular connection simply pushing the memory
past the amount established by the overcommit_ratio? overcommit_ratio
was at its default (which is apparently 50 looking at
/proc/sys/vm/overcommit_ratio).
At this point I am not sure if I should change anything... I have seen
conflicting advice at various places on the internet. Some suggest
(assuming this is a dedicated database system) to use up a large chunk
of RAM with shared_buffers, leaving enough for work_mem*max_connections
+ some extra. Others suggest allowing the OS to cache what it wants and
set shared_buffers to a smaller amount, allowing postgres to permanently
cache only the most frequently accessed smaller tables.
I would greatly appreciate your advice on this.
I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7).
Thank you,
Chris
--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin