Re: standby replication server throws invalid memory alloc request size , does not start up

Andres Freund <andres@xxxxxxxxxxx> · Thu, 28 Jun 2018 09:39:42 -0700

Hi,

On 2018-06-28 13:17:52 +0000, Vijaykumar Jain wrote:
> Hi All,
> 
> This is my first postgres query to admin list, so if I am not following the right standards for asking the question, pls let me know 😊
> 
> 
> The problem:
> 
> I have a postgres cluster as
> 
> A (primary)-> streaming replication -> B(hot_standby=on)
> 
> We had a power outage in one of the data centers, and when we got back, one of the databases servers (B the standby node) seem to show weird errors and is not starting up. A recovered fine, and it running fine.
> 
> --------- logs
> 
> 2018-06-25 10:57:04 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.
> 2018-06-25 10:57:04 UTC WARNING:  terminating connection because of crash of another server process
> 2018-06-25 10:57:04 UTC DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
> 2018-06-25 10:57:04 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.

This sounds like either was a crash or you shutdown the server with
force.  Could you post the preceding lines?

> 2018-06-25 10:57:04 UTC LOG:  database system is shut down
> 2018-06-27 16:59:28 UTC LOG:  listening on IPv4 address "0.0.0.0", port 5432
> 2018-06-27 16:59:28 UTC LOG:  listening on IPv6 address "::", port 5432
> 2018-06-27 16:59:28 UTC LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
> 2018-06-27 16:59:28 UTC LOG:  database system was interrupted while in recovery at log time 2018-06-25 10:52:21 UTC
> 2018-06-27 16:59:28 UTC HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
> 2018-06-27 16:59:28 UTC LOG:  entering standby mode
> 2018-06-27 16:59:28 UTC LOG:  recovered replication state of node 1 to 63/89B52A98
> 2018-06-27 16:59:28 UTC LOG:  redo starts at 9A/6F1B3888
> 2018-06-27 16:59:28 UTC LOG:  consistent recovery state reached at 9A/6F25BFF0
> 2018-06-27 16:59:28 UTC FATAL:  invalid memory alloc request size 1768185856
> 2018-06-27 16:59:28 UTC LOG:  database system is ready to accept read only connections
> 2018-06-27 16:59:28 UTC LOG:  startup process (PID 11829) exited with exit code 1
> 2018-06-27 16:59:28 UTC LOG:  terminating any other active server processes
> 2018-06-27 16:59:28 UTC LOG:  database system is shut down

Could you restart the server with log_min_messages set to debug3, and
change log_line_prefix so it includs the pid?

Greetings,

Andres Freund