Re: Failover Testing Failures: invalid resource manager ID in primary checkpoint record

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 19, 2023 at 6:44 AM Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:


What do you get for "pg_controldata -D datadir | grep checkpoint"?

$ /usr/lib/postgresql/12/bin/pg_controldata -D /var/lib/postgresql/12/main | grep checkpoint
Latest checkpoint location:           B7/5A000028
Latest checkpoint's REDO location:    B7/5A000028
Latest checkpoint's REDO WAL file:    00000001000000B70000005A
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:746
Latest checkpoint's NextOID:          16616
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        479
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Wed 18 Jan 2023 11:08:00 PM UTC

 
That 5A file should be from timeline 2 on the new primary. The new primary started with 59.partial on TL 1 and then 59 on TL2. I can get those logs as well.

Can you show the complete log from the point the server started?

I've sanitized hostnames etc but should still be pretty clear. Here's the initial startup after. I just noticed that there was actually another startup before what I thought was the initial startup. I've included the PG log from the point where I shut down the old primary followed by the initial startups after it became a replica. I'm going to have to review what chef might have done. I was relying on chef to deploy the configs before attempting to restart but it may have tried to start the service early since that first startup doesn't look like it was as a standby, but then it does a restart and begins requesting files and enters standby mode.

2023-01-18 22:56:18.913 UTC [3716] LOG:  received fast shutdown request
2023-01-18 22:56:18.917 UTC [3716] LOG:  aborting any active transactions
2023-01-18 22:56:18.917 UTC [4029] postgres@dtsdb - [unknown] FATAL:  terminating connection due to administrator command
2023-01-18 22:56:18.917 UTC [4028] postgres@postgres - [unknown] FATAL:  terminating connection due to administrator command
2023-01-18 22:56:18.918 UTC [4029] postgres@dtsdb - [unknown] LOG:  disconnection: session time: 1:16:10.946 user=postgres database=dtsdb host=[local]
2023-01-18 22:56:18.920 UTC [3716] LOG:  background worker "logical replication launcher" (PID 3725) exited with exit code 1
2023-01-18 22:56:18.920 UTC [4028] postgres@postgres - [unknown] LOG:  disconnection: session time: 1:16:10.949 user=postgres database=postgres host=[local]
2023-01-18 22:56:18.922 UTC [3719] LOG:  shutting down
2023-01-18 22:56:18.971 UTC [3719] LOG:  checkpoint starting: shutdown immediate
2023-01-18 22:56:18.989 P00   INFO: archive-push command begin 2.43: [pg_wal/00000001000000B700000058] --archive-async --compress-type=lz4 --exec-id=1220-855f7a50 --log-level-console=info --log-level-file=detai
l --pg1-path=/var/lib/postgresql/12/main --process-max=3 --repo1-azure-account=<redacted> --repo1-azure-container=backups --repo1-azure-key=<redacted> --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-2
56-cbc --repo1-path=/ --repo1-type=azure --spool-path=/var/lib/pg_wal/12/main/spool --stanza=dtsdb
2023-01-18 22:56:19.017 UTC [3719] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.009 s, sync=0.001 s, total=0.050 s; sync files=0, longest=0.000 s, avera
ge=0.000 s; distance=49151 kB, estimate=49151 kB
2023-01-18 22:56:19.059 UTC [1226] [unknown]@[unknown] - [unknown] LOG:  connection received: host=172.18.11.102 port=60598
2023-01-18 22:56:19.149 UTC [4127] replication@[unknown] - dtsdb LOG:  disconnection: session time: 1:15:54.644 user=replication database= host=X.Y.4.52 port=65174
2023-01-18 22:56:19.240 UTC [3851] replication@[unknown] - dtsdb LOG:  disconnection: session time: 1:16:23.520 user=replication database= host=X.Y.6.26 port=26380
2023-01-18 22:56:19.393 P00   INFO: pushed WAL file '00000001000000B700000058' to the archive asynchronously
2023-01-18 22:56:19.393 P00   INFO: archive-push command end: completed successfully (405ms)
2023-01-18 22:56:19.419 UTC [3716] LOG:  database system is shut down
2023-01-18 23:07:52.344 UTC [7251] LOG:  database system was shut down at 2023-01-18 22:56:18 UTC
2023-01-18 23:07:52.355 UTC [7249] LOG:  database system is ready to accept connections
2023-01-18 23:07:52.856 UTC [7261] [unknown]@[unknown] - [unknown] LOG:  connection received: host=X.Y.6.26 port=49638
2023-01-18 23:07:52.862 UTC [7261] replication@[unknown] - [unknown] LOG:  replication connection authorized: user=replication application_name=dtsdb SSL enabled (protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256, compression=off)
2023-01-18 23:07:52.864 UTC [7261] replication@[unknown] - dtsdb LOG:  disconnection: session time: 0:00:00.007 user=replication database= host=X.Y.6.26 port=49638
2023-01-18 23:07:52.957 UTC [7262] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:53.468 UTC [7265] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:53.469 UTC [7265] postgres@template1 - [unknown] LOG:  connection authorized: user=postgres database=template1 application_name=psql
2023-01-18 23:07:53.473 UTC [7265] postgres@template1 - psql LOG:  disconnection: session time: 0:00:00.005 user=postgres database=template1 host=[local]
2023-01-18 23:07:53.984 UTC [7270] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:53.986 UTC [7270] postgres@template1 - [unknown] LOG:  connection authorized: user=postgres database=template1 application_name=psql
2023-01-18 23:07:53.987 UTC [7270] postgres@template1 - psql LOG:  disconnection: session time: 0:00:00.002 user=postgres database=template1 host=[local]
2023-01-18 23:07:54.307 UTC [7271] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:54.308 UTC [7272] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:54.309 UTC [7271] postgres@dtsdb - [unknown] LOG:  connection authorized: user=postgres database=dtsdb
2023-01-18 23:07:54.309 UTC [7272] postgres@postgres - [unknown] LOG:  connection authorized: user=postgres database=postgres
2023-01-18 23:07:54.499 UTC [7275] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:54.500 UTC [7275] postgres@template1 - [unknown] LOG:  connection authorized: user=postgres database=template1 application_name=psql
2023-01-18 23:07:54.501 UTC [7275] postgres@template1 - psql LOG:  disconnection: session time: 0:00:00.002 user=postgres database=template1 host=[local]
2023-01-18 23:07:54.608 UTC [7306] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:54.610 UTC [7306] postgres@postgres - [unknown] LOG:  connection authorized: user=postgres database=postgres application_name=psql
2023-01-18 23:07:54.612 UTC [7306] postgres@postgres - psql LOG:  disconnection: session time: 0:00:00.004 user=postgres database=postgres host=[local]
2023-01-18 23:07:54.670 UTC [7317] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:54.672 UTC [7317] postgres@postgres - [unknown] LOG:  connection authorized: user=postgres database=postgres application_name=psql
2023-01-18 23:07:54.684 UTC [7317] postgres@postgres - psql LOG:  disconnection: session time: 0:00:00.014 user=postgres database=postgres host=[local]
2023-01-18 23:07:54.804 UTC [7335] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:54.805 UTC [7335] postgres@postgres - [unknown] LOG:  connection authorized: user=postgres database=postgres application_name=psql
2023-01-18 23:07:54.808 UTC [7335] postgres@postgres - psql LOG:  disconnection: session time: 0:00:00.003 user=postgres database=postgres host=[local]
2023-01-18 23:07:54.914 UTC [7336] [unknown]@[unknown] - [unknown] LOG:  connection received: host=172.18.11.102 port=46394
2023-01-18 23:07:57.207 UTC [7567] [unknown]@[unknown] - [unknown] LOG:  connection received: host=127.0.0.1 port=54828
2023-01-18 23:07:57.281 UTC [7578] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:57.283 UTC [7578] postgres@postgres - [unknown] LOG:  connection authorized: user=postgres database=postgres application_name=psql
2023-01-18 23:07:57.285 UTC [7578] postgres@postgres - psql LOG:  disconnection: session time: 0:00:00.004 user=postgres database=postgres host=[local]
2023-01-18 23:07:57.309 UTC [7580] [unknown]@[unknown] - [unknown] LOG:  connection received: host=127.0.0.1 port=54830
2023-01-18 23:07:57.377 UTC [7591] [unknown]@[unknown] - [unknown] LOG:  connection received: host=[local]
2023-01-18 23:07:57.378 UTC [7591] postgres@postgres - [unknown] LOG:  connection authorized: user=postgres database=postgres application_name=psql
2023-01-18 23:07:57.381 UTC [7591] postgres@postgres - psql LOG:  disconnection: session time: 0:00:00.004 user=postgres database=postgres host=[local]
2023-01-18 23:07:57.858 UTC [7594] [unknown]@[unknown] - [unknown] LOG:  connection received: host=X.Y.6.26 port=16724
2023-01-18 23:07:57.866 UTC [7594] replication@[unknown] - [unknown] LOG:  replication connection authorized: user=replication application_name=dtsdb SSL enabled (protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256, compression=off)
2023-01-18 23:07:57.867 UTC [7594] replication@[unknown] - dtsdb LOG:  disconnection: session time: 0:00:00.009 user=replication database= host=X.Y.6.26 port=16724
2023-01-18 23:07:59.937 UTC [7249] LOG:  received fast shutdown request
2023-01-18 23:07:59.942 UTC [7249] LOG:  aborting any active transactions
2023-01-18 23:07:59.942 UTC [7272] postgres@postgres - [unknown] FATAL:  terminating connection due to administrator command
2023-01-18 23:07:59.942 UTC [7271] postgres@dtsdb - [unknown] FATAL:  terminating connection due to administrator command
2023-01-18 23:07:59.943 UTC [7272] postgres@postgres - [unknown] LOG:  disconnection: session time: 0:00:05.634 user=postgres database=postgres host=[local]
2023-01-18 23:07:59.943 UTC [7271] postgres@dtsdb - [unknown] LOG:  disconnection: session time: 0:00:05.635 user=postgres database=dtsdb host=[local]
2023-01-18 23:07:59.945 UTC [7249] LOG:  background worker "logical replication launcher" (PID 7258) exited with exit code 1
2023-01-18 23:07:59.949 UTC [7252] LOG:  shutting down
2023-01-18 23:08:00.082 UTC [7252] LOG:  checkpoint starting: shutdown immediate
2023-01-18 23:08:00.119 UTC [7252] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.010 s, sync=0.001 s, total=0.044 s; sync files=0, longest=0.000 s, avera
ge=0.000 s; distance=16384 kB, estimate=16384 kB
2023-01-18 23:08:00.124 UTC [7249] LOG:  database system is shut down
2023-01-18 23:08:00.298 UTC [7844] LOG:  database system was shut down at 2023-01-18 23:08:00 UTC
2023-01-18 23:08:00.312 P00   INFO: archive-get command begin 2.43: [00000002.history, pg_wal/RECOVERYHISTORY] --archive-async --exec-id=7846-ba866c3e --log-level-console=info --log-level-file=detail --pg2-host=dtsdb0-foo.xxx.com --pg1-path=/var/lib/postgresql/12/main --pg2-path=/var/lib/postgresql/12/main --process-max=4 --repo1-azure-account=<redacted> --repo1-azure-container=backups --repo1-azure-key=<redacted> --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/ --repo1-type=azure --spool-path=/var/lib/pg_wal/12/main/spool --stanza=dtsdb
2023-01-18 23:08:00.337 P00   INFO: found 00000002.history in the repo1: 12-1 archive
2023-01-18 23:08:00.337 P00   INFO: archive-get command end: completed successfully (27ms)
2023-01-18 23:08:00.339 UTC [7844] LOG:  restored log file "00000002.history" from archive
2023-01-18 23:08:00.350 P00   INFO: archive-get command begin 2.43: [00000003.history, pg_wal/RECOVERYHISTORY] --archive-async --exec-id=7848-0ab50415 --log-level-console=info --log-level-file=detail --pg2-host=dtsdb0-foo.xxx.com --pg1-path=/var/lib/postgresql/12/main --pg2-path=/var/lib/postgresql/12/main --process-max=4 --repo1-azure-account=<redacted> --repo1-azure-container=backups --repo1-azure-key=<redacted> --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/ --repo1-type=azure --spool-path=/var/lib/pg_wal/12/main/spool --stanza=dtsdb
2023-01-18 23:08:00.375 P00   INFO: unable to find 00000003.history in the archive
2023-01-18 23:08:00.375 P00   INFO: archive-get command end: completed successfully (26ms)
2023-01-18 23:08:00.377 UTC [7844] LOG:  entering standby mode
2023-01-18 23:08:00.389 P00   INFO: archive-get command begin 2.43: [00000002.history, pg_wal/RECOVERYHISTORY] --archive-async --exec-id=7850-17f50374 --log-level-console=info --log-level-file=detail --pg2-host=dtsdb0-foo.xxx.com --pg1-path=/var/lib/postgresql/12/main --pg2-path=/var/lib/postgresql/12/main --process-max=4 --repo1-azure-account=<redacted> --repo1-azure-container=backups --repo1-azure-key=<redacted> --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/ --repo1-type=azure --spool-path=/var/lib/pg_wal/12/main/spool --stanza=dtsdb
2023-01-18 23:08:00.412 P00   INFO: found 00000002.history in the repo1: 12-1 archive
2023-01-18 23:08:00.412 P00   INFO: archive-get command end: completed successfully (24ms)
2023-01-18 23:08:00.414 UTC [7844] LOG:  restored log file "00000002.history" from archive
2023-01-18 23:08:00.435 P00   INFO: archive-get command begin 2.43: [00000002000000B70000005A, pg_wal/RECOVERYXLOG] --archive-async --exec-id=7852-6759d81b --log-level-console=info --log-level-file=detail --pg2-host=dtsdb0-foo.xxx.com --pg1-path=/var/lib/postgresql/12/main --pg2-path=/var/lib/postgresql/12/main --process-max=4 --repo1-azure-account=<redacted> --repo1-azure-container=backups --repo1-azure-key=<redacted> --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/ --repo1-type=azure --spool-path=/var/lib/pg_wal/12/main/spool --stanza=dtsdb
2023-01-18 23:08:01.139 P00   INFO: found 00000002000000B70000005A in the archive asynchronously
2023-01-18 23:08:01.140 P00   INFO: archive-get command end: completed successfully (706ms)
2023-01-18 23:08:01.141 UTC [7844] LOG:  restored log file "00000002000000B70000005A" from archive
2023-01-18 23:08:01.275 UTC [7844] LOG:  invalid resource manager ID in primary checkpoint record
2023-01-18 23:08:01.275 UTC [7844] PANIC:  could not locate a valid checkpoint record
2023-01-18 23:08:02.222 UTC [7842] LOG:  startup process (PID 7844) was terminated by signal 6: Aborted
2023-01-18 23:08:02.222 UTC [7842] LOG:  aborting startup due to startup process failure
2023-01-18 23:08:02.223 UTC [7842] LOG:  database system is shut down


--
Don Seiler
www.seiler.us

[Index of Archives]     [Postgresql Home]     [Postgresql General]     [Postgresql Performance]     [Postgresql PHP]     [Postgresql Jobs]     [PHP Users]     [PHP Databases]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Yosemite Forum]

  Powered by Linux