On 1/25/25 09:03, Дмитрий wrote:
1) What sort of replication?
- Streaming replication
2) Where are the two servers located relative to each other?
- The servers are located in different data centers.
3) Has there been any software upgrades/network changes recently?
- I don't know any information about the software upgrades/network
It would be a good thing to ask of those folks that do know.
From the log attached to your initial post:
2025-01-25 17:28:01.930 MSK [1196013] LOG: starting PostgreSQL 15.10 on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.5.0 20240719 (Red Hat
11.5.0-2), 64-bit
2025-01-25 17:28:01.930 MSK [1196013] LOG: listening on IPv4 address
"0.0.0.0", port 5432
2025-01-25 17:28:01.931 MSK [1196013] LOG: listening on Unix socket
"/run/postgresql/.s.PGSQL.5432"
2025-01-25 17:28:01.932 MSK [1196013] LOG: listening on Unix socket
"/tmp/.s.PGSQL.5432"
2025-01-25 17:28:01.962 MSK [1196017] LOG: database system was shut
down in recovery at 2025-01-25 17:28:01 MSK
2025-01-25 17:28:01.962 MSK [1196017] LOG: entering standby mode
How was it shut down, on purpose or a hardware/software issue?
Also do you have corresponding logs from primary?
2025-01-25 17:28:12.192 MSK [1196017] LOG: consistent recovery state
reached at 1063C/D002DC68
2025-01-25 17:28:12.192 MSK [1196017] LOG: incorrect resource manager
data checksum in record at 1063C/D002DC68
2025-01-25 17:28:12.192 MSK [1196013] LOG: database system is ready to
accept read-only connections
2025-01-25 17:28:12.205 MSK [1196019] LOG: started streaming WAL from
primary at 1063C/D0000000 on timeline 61
The recovery ended and the streaming started.
Not sure if 'incorrect resource manager data checksum' is significant or
not.
2025-01-25 17:29:08.452 MSK [1196015] LOG: recovery restart point at
1063C/DBC7E1D8
2025-01-25 17:29:08.452 MSK [1196015] DETAIL: Last completed
transaction was at log time 2025-01-25 16:23:08.828548+03.
2025-01-25 17:29:24.553 MSK [1196015] LOG: restartpoint starting: wal
2025-01-25 17:29:24.553 MSK [1196015] DEBUG: performing replication
slot checkpoint
2025-01-25 17:29:27.651 MSK [1196019] FATAL: could not send data to WAL
stream: lost synchronization with server: got message type "0", length
892351284
2025-01-25 17:29:27.653 MSK [1196017] LOG: invalid magic number 3600 in
log segment 0000003D0001063D000000F4, offset 212992
2025-01-25 17:29:27.653 MSK [1196017] LOG: invalid magic number 3600 in
log segment 0000003D0001063D000000F4, offset 212992
2025-01-25 17:29:27.653 MSK [1196017] LOG: invalid magic number 3600 in
log segment 0000003D0001063D000000F4, offset 212992
This is where things fall apart. What confuses me is:
"could not send data to WAL stream: lost synchronization with server:
got message type "0", length 892351284"
If this is from the standby why is it sending data to the stream?
Unless, is there cascading replication going on?
2025-01-25 17:30:01.887 MSK [1196013] LOG: received fast shutdown request
2025-01-25 17:30:01.888 MSK [1196013] LOG: aborting any active transactions
Was that a manual intervention?
2025-01-25 17:30:02.157 MSK [1196015] LOG: shutting down
2025-01-25 17:30:02.181 MSK [1196013] LOG: database system is shut down
2025-01-25 17:30:02.182 MSK [1196014] DEBUG: logger shutting down
So the server went from start up to shut down in ~2 minutes.
From your original post:
'Restarting PostgreSQL helps.'
Is that what is shown above or have you restarted since the above and
the server is running?
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx