Re: need help on memory allocation

Jeff Janes <jeff.janes@xxxxxxxxx> · Tue, 23 Jan 2018 14:36:53 -0800

On Tue, Jan 23, 2018 at 5:59 AM, Rambabu V <ram.wissen@xxxxxxxxx> wrote:

> > cat PostgreSQL-2018-01-23_060000.csv|grep FATAL

What about ERROR, not just FATAL?  Or grep for "out of memory"

$ free -mh
             total       used       free     shared    buffers     cached
Mem:           58G        58G       358M        16G       3.6M        41G
-/+ buffers/cache:        16G        42G
Swap:         9.5G       687M       8.9G

This does not seem like it should be a problem.  Is this data collected near the time of the failure?

work_mem = 256MB				# min 64kB
max_connections = 600

These look pretty high, especially in combination.  Why do you need that number of connections?  Could you use a connection pooler instead?  Or do just have an application bug (leaked connection handles) that needs to be fixed?  Why do you need that amount of work_mem?

ps -ef|grep postgres|grep idle|wc -l
171

ps -ef|grep postgres|wc -l
206

How close to the time of the problem was this recorded?  How many of the idle are 'idle in transaction'?

    PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 109063 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 39:55.61 postgres: test sss 10.20.2.228(55174) idle
  24910 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 27:45.35 postgres: testl sss 10.20.2.228(55236) idle
 115539 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 28:22.89 postgres: test sss 10.20.2.228(55184) idle
   9816 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 40:19.57 postgres: test sss   10.20.2.228(55216) idle

How close to the time of the problem was this recorded?  Nothing here seems to be a problem, because almost all the memory they have resident is shared memory.

It looks like all your clients decide to run a memory hungry query simultaneously, consume a lot of work_mem, and cause a problem.  Then by the time you notice the problem and start collecting information, they are done and things are back to normal.

Cheers,

Jeff