Re: Getting OOM errors from PostgreSQL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 7, 2016 at 11:30 AM, Chris Lajoie <chris@xxxxxxxxxxxxxxx> wrote:
> On 03/05/2016 09:12 AM, Kevin Grittner wrote:
>>
>> On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@xxxxxxxxxxxxxxx>
>> wrote:
>>>
>>> In my log I have been getting errors that look like this:
>>> LOG:  could not fork new process for connection: Cannot allocate memory
>>>
>>> This seems to only happen during periods of high activity.
>>> Shortly after the last one occurred I checked the free mem and got
>>> this:
>>> # free -h
>>>                total        used        free      shared    buff/cache
>>> available
>>> Mem:           9.4G        737M        283M        2.1G          8.4G
>>> 6.3G
>>> Swap:          2.0G        8.3M        2.0G
>>>
>>>  From this I see the vast majority of the 10GB of memory is being
>>> used by the OS cache. If I understand how it works correctly, this
>>> means it is basically free memory because if an application needs
>>> it the kernel will free some of it for use... is my understanding
>>> correct?
>>
>> That depends.  In general, this report is light on useful detail;
>> you might want to review this page and fill in some of the missing
>> info, like OS version:
>>
>> https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
>
> Appologies for not providing more useful info. I will endeavor to do so in
> the future.
>>
>> The most important missing information can be captured by running
>> these shell statements:
>>
>> cat /proc/sys/vm/overcommit_memory
>> cat /proc/sys/vm/overcommit_ratio
>>
>> If overcommit_memory is 2, you will get "out of memory" when an
>> allocation would push allocated memory past swap space (2GB in your
>> case) plus the percentage of machine RAM indicated by
>> overcommit_ratio.  Effectively, the difference between
>> overcommit_ratio and 100 is the percentage of RAM you are reserving
>> for the OS to keep things memory-resident (like buffers and cache).
>> A value of 20, for example, would mean that an allocation that
>> would reduce OS memory-resident pages below 80% of machine RAM will
>> trigger an "out of memory" error; so when you set
>> vm.overcommit_memory = 2 (which is generally a very good thing to
>> do when running PostgreSQL on the machine) you should raise
>> vm.overcommit_ratio = 80 or 90.  You don't want to totally blow
>> away OS cache, but you probably want to survive other demands on
>> RAM that bring it down quite a bit.
>
> Despite not having provided more information you seem to have gotten to the
> crux of my problem anyway. As I was a little desperate to resolve the
> problem, I ended up changing various things in an attempt to just "make it
> go away" for now. I set overcommit_memory back to 0 and shared_buffers =
> 500MB. I believe this has the effect of just allowing the OS to cache things
> as it sees fit, and only reserving 500MB for postgres to explicitly cache
> whatever it wants. But I did not have any more OOM errors over the weekend.
>
> Your explanation of how overcommit_ratio affects what the OS will allow when
> overcommit_memory=2 is enlightening, but I am still not clear on some
> specifics. For example I thought postgres allocates the shared buffers on
> startup. If this is the case, why did I get OOM errors on connections? Was
> that particular connection simply pushing the memory past the amount
> established by the overcommit_ratio? overcommit_ratio was at its default
> (which is apparently 50 looking at /proc/sys/vm/overcommit_ratio).
>
> At this point I am not sure if I should change anything... I have seen
> conflicting advice at various places on the internet. Some suggest (assuming
> this is a dedicated database system) to use up a large chunk of RAM with
> shared_buffers, leaving enough for work_mem*max_connections + some extra.
> Others suggest allowing the OS to cache what it wants and set shared_buffers
> to a smaller amount, allowing postgres to permanently cache only the most
> frequently accessed smaller tables.
>
> I would greatly appreciate your advice on this.
> I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7).


Note that running with overcommit_memory = 0 if you do start to run
out of memory, the oom killer will often kill the postmaster. This is
bad. If this happens make sure to kill all postgres children before
trying to restart the db, as starting a new postmaster with children
still running will instantly and permanently corrupt / destroy your
db.

The eventual state you want is to be able to run with overcommit = 2
and settings that prevent out of memory allocations.

Note too that if you never seem to actually run out of memory, but get
allocation errors, it can also be a lack of file handles (I think
that's what caused it in the past for me. Been a while) Point being
that you can get a failure to allocate memory when there's plenty of
memory due to other settings on your server.


-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux