On 2020-02-18 18:10:08 +0000, Nagaraj Raj wrote: > Below are the same configurations ins .conf file before and after updagrade > > show max_connections; = 1743 [...] > show work_mem = "4MB" This is an interesting combination: So you expect a large number of connections but each one should use very little RAM? [...] > here is some sys logs, > > 2020-02-16 21:01:17 UTC [-]The database process was killed by the OS > due to excessive memory consumption. > 2020-02-16 13:41:16 UTC [-]The database process was killed by the OS > due to excessive memory consumption. The oom-killer produces a huge block of messages which you can find with dmesg or in your syslog. It looks something like this: Feb 19 19:06:53 akran kernel: [3026711.344817] platzangst invoked oom-killer: gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, oom_score_adj=0 Feb 19 19:06:53 akran kernel: [3026711.344819] platzangst cpuset=/ mems_allowed=0-1 Feb 19 19:06:53 akran kernel: [3026711.344825] CPU: 7 PID: 2012 Comm: platzangst Tainted: G OE 4.15.0-74-generic #84-Ubuntu Feb 19 19:06:53 akran kernel: [3026711.344826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.7 06/16/2016 Feb 19 19:06:53 akran kernel: [3026711.344827] Call Trace: Feb 19 19:06:53 akran kernel: [3026711.344835] dump_stack+0x6d/0x8e Feb 19 19:06:53 akran kernel: [3026711.344839] dump_header+0x71/0x285 ... Feb 19 19:06:53 akran kernel: [3026711.344893] RIP: 0033:0x7f292d076b1c Feb 19 19:06:53 akran kernel: [3026711.344894] RSP: 002b:00007fff187ef240 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 Feb 19 19:06:53 akran kernel: [3026711.344895] RAX: ffffffffffffffda RBX: 00007fff187ef240 RCX: 00007f292d076b1c Feb 19 19:06:53 akran kernel: [3026711.344896] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 Feb 19 19:06:53 akran kernel: [3026711.344897] RBP: 00007fff187ef2b0 R08: 00007f292d596740 R09: 00000000009d43a0 Feb 19 19:06:53 akran kernel: [3026711.344897] R10: 00007f292d596a10 R11: 0000000000000246 R12: 0000000000000000 Feb 19 19:06:53 akran kernel: [3026711.344898] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000 Feb 19 19:06:53 akran kernel: [3026711.344899] Mem-Info: Feb 19 19:06:53 akran kernel: [3026711.344905] active_anon:14862589 inactive_anon:1133875 isolated_anon:0 Feb 19 19:06:53 akran kernel: [3026711.344905] active_file:467 inactive_file:371 isolated_file:0 Feb 19 19:06:53 akran kernel: [3026711.344905] unevictable:0 dirty:3 writeback:0 unstable:0 ... Feb 19 19:06:53 akran kernel: [3026711.344985] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Feb 19 19:06:53 akran kernel: [3026711.344997] [ 823] 0 823 44909 0 106496 121 0 lvmetad Feb 19 19:06:53 akran kernel: [3026711.344999] [ 1354] 0 1354 11901 3 135168 112 0 rpcbind Feb 19 19:06:53 akran kernel: [3026711.345000] [ 1485] 0 1485 69911 99 180224 159 0 accounts-daemon ... Feb 19 19:06:53 akran kernel: [3026711.345345] Out of memory: Kill process 25591 (postgres) score 697 or sacrifice child Feb 19 19:06:53 akran kernel: [3026711.346563] Killed process 25591 (postgres) total-vm:71116948kB, anon-rss:52727552kB, file-rss:0kB, shmem-rss:3023196kB The most interesting lines are usually the last two: In this case they tell us that the process killed was a postgres process and it occupied about 71 GB of virtual memory at that time. That was clearly the right choice since the machine has only 64 GB of RAM. Sometimes it is less clear and then you might want to scroll through the (usually long) list of processes to see if there are other processes which need suspicious amounts of RAM or maybe if there are just more of them than you would expect. > I identified one simple select which consuming more memory and here is the > query plan, > > > > "Result (cost=0.00..94891854.11 rows=3160784900 width=288)" > " -> Append (cost=0.00..47480080.61 rows=3160784900 width=288)" > " -> Seq Scan on msghist (cost=0.00..15682777.12 rows=3129490000 width > =288)" > " Filter: (((data -> 'info'::text) ->> 'status'::text) = > 'CLOSE'::text)" > " -> Seq Scan on msghist msghist_1 (cost=0.00..189454.50 rows=31294900 > width=288)" > " Filter: (((data -> 'info'::text) ->> 'status'::text) = > 'CLOSE'::text)" So: How much memory does that use? It produces a huge number of rows (more than 3 billion) but it doesn't do much with them, so I wouldn't expect the postgres process itself to use much memory. Are you sure its the postgres process and not the application which uses a lot of memory? hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature