On 1/26/25 03:29, Дмитрий wrote:
"How was it shut down, on purpose or a hardware/software issue?"
- I reboot the receiver every 2 minutes on purpose. I determined this
time empirically, because replication breaks down approximately every
minute and a half. The reboot helps to advance the receiver.
"Also do you have corresponding logs from primary?"
- Attached to this message.
"Unless, is there cascading replication going on?"
- No, this is replication from the leader. The leader has its two
replicas and they are all in one data center. And the problematic
replica is needed to migrate to another data center.
"Was that a manual intervention?"
- Yes, reboot on schedule, every two minutes.
"Is that what is shown above or have you restarted since the above and
the server is running?"
- Sometimes replication works without problems for several hours. But
when a breakdown occurs, rebooting every two minutes helps to catch up
with this replica.
1) It would make life easier if the log line entry prefix timestamp was
set to same precision on primary and standby. As of now it looks like
the primary has %t (Time stamp without milliseconds) and the standby has
%m (Time stamp with milliseconds)
2) From the logs.
Primary:
2025-01-26 12:21:27 MSK [656]: [11-1]
app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 STATEMENT:
START_REPLICATION SLOT "slot_migration_to_rcod" 106B6/52000000 TIMELINE 61
2025-01-26 12:21:27 MSK [656]: [12-1]
app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 LOG:
disconnection: session time: 0:01:05.329 user=replicator database=
host=192.168.5.1 port=58380
Standby:
2025-01-26 12:21:27.113 MSK [10824] FATAL: could not send data to WAL
stream: lost synchronization with server: got message type "0", length
825373235
Do you know what is doing START_REPLICATION SLOT?
Another interesting point. In addition to this replication, there are
two more, to the same data center. One replication had the same problem,
but a one-time restart helped to solve the problem, the replication is
still working normally. And the second replication does not have such
problems, it has been working since its launch, more than a month ago.
--
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx