Thanks Tom Lane for detailing the issue.
>So ... how many is "a bunch"?more than 85
>Looking at the code, it seems like it'd be possible for a sufficiently
>aggressive spawner of incoming connections to reach the
>MaxLivePostmasterChildren limit. While the postmaster would correctly
>reject additional connection attempts after that, what it would not do
>is ensure that any child slots are left for new parallel worker processes.
>So we could hypothesize that the error you're seeing in the log is from
>failure to spawn a parallel worker process, due to being out of child
>slots.
Thanks Tom Lane for detailing the issue.
>Looking at the code, it seems like it'd be possible for a sufficiently
>aggressive spawner of incoming connections to reach the
>MaxLivePostmasterChildren limit. While the postmaster would correctly
>reject additional connection attempts after that, what it would not do
>is ensure that any child slots are left for new parallel worker processes.
>So we could hypothesize that the error you're seeing in the log is from
>failure to spawn a parallel worker process, due to being out of child
>slots.
Thanks Tom Lane for detailing the issue.
we have enabled "max_parallel_workers_per_gather = 4". 20 days before we ran into this issue .
>However, given that max_connections = 500, MaxLivePostmasterChildren()
>would be 1000-plus. This would mean that reaching this condition would
>require *at least* 500 concurrent connection-attempts-that-haven't-yet-
>been-rejected, maybe well more than that if you didn't have close to
>500 legitimately open sessions. That seems like a lot, enough to suggest
>that you've got some pretty serious bug in your client-side logic.
>would be 1000-plus. This would mean that reaching this condition would
>require *at least* 500 concurrent connection-attempts-that-haven't-yet-
>been-rejected, maybe well more than that if you didn't have close to
>500 legitimately open sessions. That seems like a lot, enough to suggest
>that you've got some pretty serious bug in your client-side logic.
below errors observed after crash in postgres logfile :
ERROR: xlog flush request is not satisfied for couple of tables , we have initiated the vacuum full on those tables and the error went off after that.
ERROR: right sibling's left-link doesn't match: block 273660 links to 273500 instead of expected 273661 in index -- observed this error while doing vacuum freeze on databsase , we have dropped this index and created a new one
Observations :
Vacuum freeze analyze job is getting stuck at database end which is initiated thru cronjob, pg_cancel_backend(), pg_termiante_backend() is not able to terminate those stuck process , Restarting the database only able to clear those process , i am thinking this is happening due to corruption (if this is true how can i detect this ? pg_dump ?). is there any way to overcome this problem ?
does migrating the database to a new instance (pg_basebackup and switching over to new instance ) solves this issue ?
Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support. But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support. But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.
On Fri, 4 Oct 2019 at 03:49, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
bhargav kamineni <bhargavpostgres@xxxxxxxxx> writes:
>> What was the database doing just before the FATAL line?
> Postgres was rejecting a bunch of connections from a user who is having a
> connection limit set. that was the the FATAL error that i could see in log
> file.
> FATAL,53300,"too many connections for role ""user_app"""
So ... how many is "a bunch"?
Looking at the code, it seems like it'd be possible for a sufficiently
aggressive spawner of incoming connections to reach the
MaxLivePostmasterChildren limit. While the postmaster would correctly
reject additional connection attempts after that, what it would not do
is ensure that any child slots are left for new parallel worker processes.
So we could hypothesize that the error you're seeing in the log is from
failure to spawn a parallel worker process, due to being out of child
slots.
However, given that max_connections = 500, MaxLivePostmasterChildren()
would be 1000-plus. This would mean that reaching this condition would
require *at least* 500 concurrent connection-attempts-that-haven't-yet-
been-rejected, maybe well more than that if you didn't have close to
500 legitimately open sessions. That seems like a lot, enough to suggest
that you've got some pretty serious bug in your client-side logic.
Anyway, I think it's clearly a bug that canAcceptConnections() thinks the
number of acceptable connections is identical to the number of allowed
child processes; it needs to be less, by the number of background
processes we want to support. But it seems like a darn hard-to-hit bug,
so I'm not quite sure that that explains your observation.
regards, tom lane