Re: ERROR: too many dynamic shared memory segments

Nicola Contu <nicola.contu@xxxxxxxxx> · Tue, 21 Jan 2020 16:06:05 +0100

Hey Thomas,after a few months, we started having this issue again.
So we revert the work_mem parameter to 600MB instead of 2GB.
But the issue is still there. A query went to segmentation fault, the DB went to recovery mode and our app went to read only for a few minutes.

I understand we can increase max_connections so we can have many more segments.

My question is : is there a way to understand the number of segments we reached?
Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have about 500 shared segments.
We would like to increase that number to 300 or 400 but would be great to understand if there is a way to make sure we will solve the issue as it requires a restart of the service.

I know you were also talking about a redesign this part in PostgreSQL. Do you know if anything has changed in any of the newer versions after 11.5?

Thanks a lot,
Nicola

Il giorno gio 12 set 2019 alle ore 01:01 Thomas Munro <thomas.munro@xxxxxxxxx> ha scritto:
On Wed, Sep 11, 2019 at 11:20 PM Nicola Contu <nicola.contu@xxxxxxxxx> wrote:

> If the error persist I will try to revert the work_mem.

> Thanks a lot

Hi Nicola,

It's hard to say exactly what the cause of the problem is in your case

and how to avoid it, without knowing what your query plans look like.

PostgreSQL allows 64 + 2 * max_connections segments to exist a time,

and it needs a number of them that depends on work_mem (in the case of

Parallel Hash Join and Parallel Bitmap Index Scan), and also depends

on the number of Gather nodes that appear in the plan, which in some

unusual cases can result from partitioning.

I've seen people reaching this error by running a lot of parallel

queries concurrently.  If that's the cause, then you can definitely

get some relief by turning work_mem down, or by turning

max_connections up (even though you don't want to allow more

connections -- because it influences the formula for deciding on the

DSM segment limit).  We should probably adjust some of the internal

constants to give us more slots, to avoid that problem, as discussed

here:

https://www.postgresql.org/message-id/flat/CA%2BhUKGL6H2BpGbiF7Lj6QiTjTGyTLW_vLR%3DSn2tEBeTcYXiMKw%40mail.gmail.com

I've also seen people reaching this error by somehow coming up with

plans that had a very large number of Gather nodes in them,

corresponding to partitions; that's probably a bad plan (it'd

presumably be better to terminate parallelism higher up in the plan,

but these plans do seem to exist in the wild; I don't recall exactly

why).  I think we need a bit of a redesign so that if there are

multiple Gather nodes, they share the same main DSM segment, instead

of blowing through this limit.

-- 

Thomas Munro

https://enterprisedb.com