>>dsa_allocate could not find 7 free pages I just this error message again on all of my worker nodes (I am using Citus 7.4 rel). The PG core is my own build of release_10_stable (10.4) out of GitHub on Ubuntu. What's the best way to debug this? I am running pre-production tests for the next few days, so I could gather info. if necessary (I cannot pinpoint a query to repro this yet, as we have 10K queries running concurrently). On Mon, Jan 29, 2018 at 1:35 PM, Rick Otten <rottenwindfish@xxxxxxxxx> wrote: > If I do a "set max_parallel_workers_per_gather=0;" before I run the query in > that session, it runs just fine. > If I set it to 2, the query dies with the dsa_allocate error. > > I'll use that as a work around until 10.2 comes out. Thanks! I have > something that will help. > > > On Mon, Jan 29, 2018 at 3:52 PM, Thomas Munro > <thomas.munro@xxxxxxxxxxxxxxxx> wrote: >> >> On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> > Rick Otten <rottenwindfish@xxxxxxxxx> writes: >> >> I'm wondering if there is anything I can tune in my PG 10.1 database to >> >> avoid these errors: >> > >> >> $ psql -f failing_query.sql >> >> psql:failing_query.sql:46: ERROR: dsa_allocate could not find 7 free >> >> pages >> >> CONTEXT: parallel worker >> > >> > Hmm. There's only one place in the source code that emits that message >> > text: >> > >> > /* >> > * Ask the free page manager for a run of pages. This should >> > always >> > * succeed, since both get_best_segment and make_new_segment >> > should >> > * only return a non-NULL pointer if it actually contains enough >> > * contiguous freespace. If it does fail, something in our >> > backend >> > * private state is out of whack, so use FATAL to kill the >> > process. >> > */ >> > if (!FreePageManagerGet(segment_map->fpm, npages, &first_page)) >> > elog(FATAL, >> > "dsa_allocate could not find %zu free pages", npages); >> > >> > Now maybe that comment is being unreasonably optimistic, but it sure >> > appears that this is supposed to be a can't-happen case, in which case >> > you've found a bug. >> >> This is probably the bug fixed here: >> >> >> https://www.postgresql.org/message-id/E1eQzIl-0004wM-K3%40gemulon.postgresql.org >> >> That was back patched, so 10.2 will contain the fix. The bug was not >> in dsa.c itself, but in the parallel query code that mixed up DSA >> areas, corrupting them. The problem comes up when the query plan has >> multiple Gather nodes (and a particular execution pattern) -- is that >> the case here, in the EXPLAIN output? That seems plausible given the >> description of a 50-branch UNION. The only workaround until 10.2 >> would be to reduce max_parallel_workers_per_gather to 0 to prevent >> parallelism completely for this query. >> >> -- >> Thomas Munro >> http://www.enterprisedb.com > >