On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa <jakub.glapa@xxxxxxxxx> wrote: > As for the crash. I dug up the initial log and it looks like a segmentation > fault... > > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR: too > many dynamic shared memory segments Hmm. Well this error can only occur in dsm_create() called without DSM_CREATE_NULL_IF_MAXSEGMENTS. parallel.c calls it with that flag and dsa.c doesn't (perhaps it should, not sure, but that'd just change the error message), so that means this the error arose from dsa.c trying to get more segments. That would be when Parallel Bitmap Heap Scan tried to allocate memory. I hacked my copy of PostgreSQL so that it allows only 5 DSM slots and managed to reproduce a segv crash by trying to run concurrent Parallel Bitmap Heap Scans. The stack looks like this: * frame #0: 0x00000001083ace29 postgres`alloc_object(area=0x0000000000000000, size_class=10) + 25 at dsa.c:1433 frame #1: 0x00000001083acd14 postgres`dsa_allocate_extended(area=0x0000000000000000, size=72, flags=4) + 1076 at dsa.c:785 frame #2: 0x0000000108059c33 postgres`tbm_prepare_shared_iterate(tbm=0x00007f9743027660) + 67 at tidbitmap.c:780 frame #3: 0x0000000108000d57 postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at nodeBitmapHeapscan.c:156 frame #4: 0x0000000107fefc5b postgres`ExecScanFetch(node=0x00007f9743019c88, accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77), recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) + 459 at execScan.c:95 frame #5: 0x0000000107fef983 postgres`ExecScan(node=0x00007f9743019c88, accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77), recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) + 147 at execScan.c:162 frame #6: 0x00000001080008d1 postgres`ExecBitmapHeapScan(pstate=0x00007f9743019c88) + 49 at nodeBitmapHeapscan.c:735 (lldb) f 3 frame #3: 0x0000000108000d57 postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at nodeBitmapHeapscan.c:156 153 * dsa_pointer of the iterator state which will be used by 154 * multiple processes to iterate jointly. 155 */ -> 156 pstate->tbmiterator = tbm_prepare_shared_iterate(tbm); 157 #ifdef USE_PREFETCH 158 if (node->prefetch_maximum > 0) 159 (lldb) print tbm->dsa (dsa_area *) $3 = 0x0000000000000000 (lldb) print node->ss.ps.state->es_query_dsa (dsa_area *) $5 = 0x0000000000000000 (lldb) f 17 frame #17: 0x000000010800363b postgres`ExecGather(pstate=0x00007f9743019320) + 635 at nodeGather.c:220 217 * Get next tuple, either from one of our workers, or by running the plan 218 * ourselves. 219 */ -> 220 slot = gather_getnext(node); 221 if (TupIsNull(slot)) 222 return NULL; 223 (lldb) print *node->pei (ParallelExecutorInfo) $8 = { planstate = 0x00007f9743019640 pcxt = 0x00007f97450001b8 buffer_usage = 0x0000000108b7e218 instrumentation = 0x0000000108b7da38 area = 0x0000000000000000 param_exec = 0 finished = '\0' tqueue = 0x0000000000000000 reader = 0x0000000000000000 } (lldb) print *node->pei->pcxt warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available. (ParallelContext) $9 = { node = { prev = 0x000000010855fb60 next = 0x000000010855fb60 } subid = 1 nworkers = 0 nworkers_launched = 0 library_name = 0x00007f9745000248 "postgres" function_name = 0x00007f9745000268 "ParallelQueryMain" error_context_stack = 0x0000000000000000 estimator = (space_for_chunks = 180352, number_of_keys = 19) seg = 0x0000000000000000 private_memory = 0x0000000108b53038 toc = 0x0000000108b53038 worker = 0x0000000000000000 } I think there are two failure modes: one of your sessions showed the "too many ..." error (that's good, ran out of slots and said so and our error machinery worked as it should), and another crashed with a segfault, because it tried to use a NULL "area" pointer (bad). I think this is a degenerate case where we completely failed to launch parallel query, but we ran the parallel query plan anyway and this code thinks that the DSA is available. Oops. -- Thomas Munro http://www.enterprisedb.com