Hello, may I ask you for a feedback?
Thanks a lot
Il giorno mar 21 gen 2020 alle ore 17:14 Nicola Contu <nicola.contu@xxxxxxxxx> ha scritto:
We also reverted this param :cmdv3=# show max_parallel_workers_per_gather;
max_parallel_workers_per_gather
---------------------------------
2
(1 row)It was set to 8.Il giorno mar 21 gen 2020 alle ore 16:06 Nicola Contu <nicola.contu@xxxxxxxxx> ha scritto:Hey Thomas,after a few months, we started having this issue again.So we revert the work_mem parameter to 600MB instead of 2GB.But the issue is still there. A query went to segmentation fault, the DB went to recovery mode and our app went to read only for a few minutes.I understand we can increase max_connections so we can have many more segments.My question is : is there a way to understand the number of segments we reached?Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have about 500 shared segments.We would like to increase that number to 300 or 400 but would be great to understand if there is a way to make sure we will solve the issue as it requires a restart of the service.I know you were also talking about a redesign this part in PostgreSQL. Do you know if anything has changed in any of the newer versions after 11.5?Thanks a lot,NicolaIl giorno gio 12 set 2019 alle ore 01:01 Thomas Munro <thomas.munro@xxxxxxxxx> ha scritto:On Wed, Sep 11, 2019 at 11:20 PM Nicola Contu <nicola.contu@xxxxxxxxx> wrote:
> If the error persist I will try to revert the work_mem.
> Thanks a lot
Hi Nicola,
It's hard to say exactly what the cause of the problem is in your case
and how to avoid it, without knowing what your query plans look like.
PostgreSQL allows 64 + 2 * max_connections segments to exist a time,
and it needs a number of them that depends on work_mem (in the case of
Parallel Hash Join and Parallel Bitmap Index Scan), and also depends
on the number of Gather nodes that appear in the plan, which in some
unusual cases can result from partitioning.
I've seen people reaching this error by running a lot of parallel
queries concurrently. If that's the cause, then you can definitely
get some relief by turning work_mem down, or by turning
max_connections up (even though you don't want to allow more
connections -- because it influences the formula for deciding on the
DSM segment limit). We should probably adjust some of the internal
constants to give us more slots, to avoid that problem, as discussed
here:
https://www.postgresql.org/message-id/flat/CA%2BhUKGL6H2BpGbiF7Lj6QiTjTGyTLW_vLR%3DSn2tEBeTcYXiMKw%40mail.gmail.com
I've also seen people reaching this error by somehow coming up with
plans that had a very large number of Gather nodes in them,
corresponding to partitions; that's probably a bad plan (it'd
presumably be better to terminate parallelism higher up in the plan,
but these plans do seem to exist in the wild; I don't recall exactly
why). I think we need a bit of a redesign so that if there are
multiple Gather nodes, they share the same main DSM segment, instead
of blowing through this limit.
--
Thomas Munro
https://enterprisedb.com