Hi Stephen, > On 10. Jul, 2020, at 17:45, Stephen Frost <sfrost@xxxxxxxxxxx> wrote: > > Sure, if you know exactly why the former primary failed and have > confidence that nothing actually bad happened then pg_rewind can work > (though it's still not what I'd generally recommend). > > If you don't actually know what happened to the former primary to cause > it to fail then I definitely wouldn't use pg_rewind on it since it > doesn't have any checks to make sure that the data is actually > generally consistent. These days you could get a bit of a better > feeling by running pg_checksums against the data dir, but that's not > going to be as good as doing a pgbackrest delta restore when it comes to > making sure that everything is valid. we use Netapp plus continuous archiving. To protect agains block corruption, all our database clusters have been created with initdb -k. So they should report block corruptions in the log. The usual reason why a database cluster goes down is because the server is shut down which initiates a switchover and is not problematic. If the server goes down by a power outage, system crash or similar, then an automatic failover is initiated, which, according to our experience, is also not problematic. Patroni seems to handle both situations well. The worst case is, that both servers crash, which is pretty unlikely. So, the worst case is that we have to perform a volume restore with Netapp and replay the WAL files since that last snapshot. Should the replica database cluster be damaged too, then we may need to reinit it with Patroni. This is acceptable even for large database clusters because replication runs fast. But the possibility is very, very small. Why the -k option of initdb isn't default anyway, is beyond me. Yes, I know the argument about pg_upgrade messages, which people can't seem to cope with for some reason, but I can't see the reasoning. If I wanted to do a pg_upgrade from an older non-checksummed database cluster to a new major version with checksums, then I'd do initdb explicitly without checksums and perform the upgrade. Then I would enable checksums and that's it from then on. It's a one time only simple command for each affected database cluster. So, in my opinion, -k should be default and if one wanted to create a non-checksummed database cluster, it would have to be stated on the command line explicitly. This IMHO is a reasonable way to make people migrate to checksums over time as database clusters are migrated. But then, that's only my opinion. There is no absolute truth. Cheers, Paul