On 2022-04-28 11:09:12 +0200, Zb B wrote: > > When the secondary starts up it should continue replicating from where > > it stopped. However, it can only do this if the necessary information is > > still available. If WAL files have been deleted in the mean time. it > > can't replay them. There should be error messages in your logs on what > > went wrong > > I did another test using different wal_sender_timeout parameter, as the time of > the secondary being shut down was longer than the default 60s for this > parameter. I don't think this will help. It will just make the primary slower in noticing that the secondary is gone. > I was hoping it would help but the result was the same (records were not > replicated to the secondary after the patroni start). Well, I just verified > again that the records were replicated after about 15 minutes to the secondary, > so probably the timeout setting helped, or I was not patient enough before. The latter, I suspect. Although I'm surprised that it takes so long. In my experience, that takes only a few seconds, certainly less than a minute for replication to start (how long it takes to finish depends on the amount of data, of course). Patroni can nuke the secondary database and create a fresh copy (using basebackup). That might take 15 minutes (depending on the database size). I don't think it does that automatically, though. Also I think you would have noticed that. What does `patronictl list` show during that interval? > Is it normal to wait so long for the replication? (the original > transaction in primary took about 5 minutes and was about 3000 small > records). I am providing more details for completeness below: > > I get the following errors on the primary DB: > 2022-04-28 04:36:50.544 EDT [13794] WARNING: archive_mode enabled, yet > archive_command is not set > 2022-04-28 04:37:34.893 EDT [14755] ERROR: replication slot "xyzd3riardb05" > does not exist > 2022-04-28 04:37:34.893 EDT [14755] STATEMENT: START_REPLICATION SLOT > "xyzd3riardb05" 0/7000000 TIMELINE 18 ... > and after some time such errors stop to appear. So the replication slot is probably created after some time and then replication starts to work. I think that replication slot is managed by Patroni. So the question would be: Why does Patroni take so long to create it? Did it log anything? hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature