Search Postgresql Archives

Re: ERROR: too many dynamic shared memory segments

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we only had the "too many shared too many dynamic shared memory segments" error but no segmentation faults. The error started occurring after upgrading from postgres 10 to postgres 12 (server has 24 cores / 48 threads, i.e. many parallel workers). The error itself was not that much of a problem but /dev/shm started filling up with orphaned files which probably (?) had not been cleaned up by postgres after the parallel workers died. In consequence, after some time, /dev/shm was full and everything crashed.

Unfortunately, the only "solution" we found so far was to increase max connections from 100 to 1000. After that (about 2 months ago I think), the error had gone.

Maybe this helps...

Julian

Am Mi., 29. Jan. 2020 um 10:37 Uhr schrieb Nicola Contu <nicola.contu@xxxxxxxxx>:
This is the error on postgres log of the segmentation fault :

2020-01-21 14:20:29 GMT [] [42222]: [108-1] db=,user= LOG:  server process (PID 2042) was terminated by signal 11: Segmentation fault
2020-01-21 14:20:29 GMT [] [42222]: [109-1] db=,user= DETAIL:  Failed process was running: select pid from pg_stat_activity where query ilike 'REFRESH MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats'
2020-01-21 14:20:29 GMT [] [42222]: [110-1] db=,user= LOG:  terminating any other active server processes
2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [4-1] db=cmdv3,user=admin WARNING:  terminating connection because of crash of another server process
2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [5-1] db=cmdv3,user=admin DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [6-1] db=cmdv3,user=admin HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-01-21 14:20:29 GMT [127.0.0.1(34026)] [2055]: [5-1] db=cmdv3,user=admin WARNING:  terminating connection because of crash of another server process

At CentOS level :
Jan 21 14:20:29 usnyh-cmd1.gt-t.net kernel: postmaster[2042]: segfault at 0 ip 000000000048bef4 sp 00007ffdf4955bb0 error 4 in postgres[400000+6c5000]
Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905 GMT [33727] WARNING C-0x21526e8: cmdv3/admin@10.151.2.154:39688 pooler error: server conn crashed?
Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905 GMT [33727] WARNING S-0x1f5ff58: cmdv3/admin@127.0.0.1:5432 got packet 'N' from server when not linked
Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905 GMT [33727] WARNING S-0x25097f0: cmdv3/admin@127.0.0.1:5432 got packet 'N' from server when not linked
Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.906 GMT [33727] WARNING S-0x2508b60: cmdv3/admin@127.0.0.1:5432 got packet 'N' from server when not linked
Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.981 GMT [33727] ERROR S: login failed: FATAL: the database system is in recovery mode


> If you're on Linux, you can probably see them with "ls /dev/shm".

I see a lot of files there, and doing a cat they are empty. What can I do with them?

Those are two different problems I guess, but they are related because right before the Segmentation Fault I see a lot of shared segment errors in the postgres log.

Il giorno mer 29 gen 2020 alle ore 10:09 Thomas Munro <thomas.munro@xxxxxxxxx> ha scritto:
On Wed, Jan 22, 2020 at 4:06 AM Nicola Contu <nicola.contu@xxxxxxxxx> wrote:
> after a few months, we started having this issue again.
> So we revert the work_mem parameter to 600MB instead of 2GB.
> But the issue is still there. A query went to segmentation fault, the DB went to recovery mode and our app went to read only for a few minutes.

Hi Nicola,
Hmm, a segmentation fault sounds like a different problem.  Can you
please share the exact error messages from PostgreSQL and OS logs?

> I understand we can increase max_connections so we can have many more segments.
>
> My question is : is there a way to understand the number of segments we reached?

If you're on Linux, you can probably see them with "ls /dev/shm".

> Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have about 500 shared segments.
> We would like to increase that number to 300 or 400 but would be great to understand if there is a way to make sure we will solve the issue as it requires a restart of the service.
>
> I know you were also talking about a redesign this part in PostgreSQL. Do you know if anything has changed in any of the newer versions after 11.5?

It's possible that we should increase a couple of constants used the
formula -- I'll look into that again.  But first I'd like to see if
we're even investigating the right problem here.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux