Re: ERROR: too many dynamic shared memory segments

Thomas Munro <thomas.munro@xxxxxxxxx> · Wed, 29 Jan 2020 23:53:24 +1300

On Wed, Jan 29, 2020 at 10:37 PM Nicola Contu <nicola.contu@xxxxxxxxx> wrote:
> This is the error on postgres log of the segmentation fault :
>
> 2020-01-21 14:20:29 GMT [] [42222]: [108-1] db=,user= LOG:  server process (PID 2042) was terminated by signal 11: Segmentation fault
> 2020-01-21 14:20:29 GMT [] [42222]: [109-1] db=,user= DETAIL:  Failed process was running: select pid from pg_stat_activity where query ilike 'REFRESH MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats'
> 2020-01-21 14:20:29 GMT [] [42222]: [110-1] db=,user= LOG:  terminating any other active server processes

Ok, this is a bug.  Do you happen to have a core file?  I don't recall
where CentOS puts them.

> > If you're on Linux, you can probably see them with "ls /dev/shm".
>
> I see a lot of files there, and doing a cat they are empty. What can I do with them?

Not much, but it tells you approximately how many 'slots' are in use
at a given time (ie because of currently running parallel queries), if
they were created since PostgreSQL started up (if they're older ones
they could have leaked from a crashed server, but we try to avoid that
by trying to clean them up when you restart).

> Those are two different problems I guess, but they are related because right before the Segmentation Fault I see a lot of shared segment errors in the postgres log.

That gave me an idea...  I hacked my copy of PostgreSQL to flip a coin
to decide whether to pretend there are no slots free (see below), and
I managed to make it crash in the regression tests when doing a
parallel index build.  It's late here now, but I'll look into that
tomorrow.  It's possible that the parallel index code needs to learn
to cope with that.

#2  0x0000000000a096f6 in SharedFileSetInit (fileset=0x80b2fe14c,
seg=0x0) at sharedfileset.c:71
#3  0x0000000000c72440 in tuplesort_initialize_shared
(shared=0x80b2fe140, nWorkers=2, seg=0x0) at tuplesort.c:4341
#4  0x00000000005ab405 in _bt_begin_parallel
(buildstate=0x7fffffffc070, isconcurrent=false, request=1) at
nbtsort.c:1402
#5  0x00000000005aa7c7 in _bt_spools_heapscan (heap=0x801ddd7e8,
index=0x801dddc18, buildstate=0x7fffffffc070, indexInfo=0x80b2b62d0)
at nbtsort.c:396
#6  0x00000000005aa695 in btbuild (heap=0x801ddd7e8,
index=0x801dddc18, indexInfo=0x80b2b62d0) at nbtsort.c:328
#7  0x0000000000645b5c in index_build (heapRelation=0x801ddd7e8,
indexRelation=0x801dddc18, indexInfo=0x80b2b62d0, isreindex=false,
parallel=true) at index.c:2879
#8  0x0000000000643e5c in index_create (heapRelation=0x801ddd7e8,
indexRelationName=0x7fffffffc510 "pg_toast_24587_index",
indexRelationId=24603, parentIndexRelid=0,

I don't know if that's the bug that you're hitting, but it definitely
could be: REFRESH MATERIALIZED VIEW could be rebuilding an index.

===

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 90e0d739f8..f0b49d94ee 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -468,6 +468,13 @@ dsm_create(Size size, int flags)
        nitems = dsm_control->nitems;
        for (i = 0; i < nitems; ++i)
        {
+               /* BEGIN HACK */
+               if (random() % 10 > 5)
+               {
+                       nitems = dsm_control->maxitems;
+                       break;
+               }
+               /* END HACK */
                if (dsm_control->item[i].refcnt == 0)
                {
                        dsm_control->item[i].handle = seg->handle;