richard@xxxxxxxxx writes: > Dear Jerry, > > So, yes it turns out that some kind of loop must be involved here, as > you described: > > 1. ensure cluster is running > 2. stop primary > 3. wait some time > 4. stop replicas > 5. check if checkpoint locations match. repeat from step 1 if > out-of-sync. > > My question here is, the unreliable step here is 3rd one. Can we query > the replica runtime if he did catch up? I mean, that after stopping > the primary, we can obtain the checkpoint location from > pg_controldata, then, can we somehow query the running replica about > that? Assuming your client traffic has been stopped ahead of time and perhaps you did a lockout via HBA or other means, including forcible termination of persistent clients (we usually do a restart of the primary to insure this)... We don't wait more than a few seconds before also stopping the replicas and the vast majority of times all nodes are at the same checkpoint. Cheers! > > Thanks in advance, > Richard > > 2025-02-20 08:49 időpontban Jerry Sievers ezt írta: >> richard@xxxxxxxxx writes: >> >>> Dear Alvaro, >>> Thanks for your answers. Unfortunately, I was unaware of a shutdown >>> record, that makes a difference then. So, I definitely must stop the >>> primary first, then use pg_controldata to obtain checkpoint >>> info. Then, can I query the replicas while they are up and running if >>> they've received the shutdown record or not? So, after shutting down >>> the primary, how will I know if a replica has received the mentioned >>> record, and is safe to shutdown? >> Hmmm, not sure about that but what we do, is stop primary, wait a >> $short time, then stop replicas... >> Then run pg_controldata on all nodes | filter out only the line >> indicating latest checkpoint and sort -u the output. Expect only a >> single line if all are matched. >> You may also wish to first insure that you got the same number of >> lines as total node count before doing the sorting and uniqueing. >> Very rarely on our huge systems, we'd have a mismatch after the >> verification in in those cases, our automated upgrade procedure >> restarts all nodes and then does the shutdown and verify check again. >> HTH >> >>> Thanks for the clarifications. >>> Best regards, >>> Richard >>> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta: >>>> On 2025-Feb-19, richard@xxxxxxxxx wrote: >>>> >>>>> With this, I have the question, that after the shutdown of primary, >>>>> what is >>>>> the guarantee for replicas having the same checkpoint location? Why >>>>> does the >>>>> order of shutting down the servers matter? What would be the really >>>>> exact >>>>> and reliable way to ensure that replicas will have the same >>>>> checkpoint >>>>> location as the primary? >>>> The replicas can't write WAL by themselves, but they will replay >>>> whatever the primary has sent; by shutting down the primary first and >>>> letting the replicas catch up, you ensure that the replicas will >>>> actually receive the shutdown record and replay it. If you shut down >>>> the replicas first, they can obviously never catch up with the >>>> shutdown >>>> checkpoint of the primary. >>>> As I recall, if you do shut down the primary first, one potential >>>> danger >>>> is that the primary fails to send the checkpoint record before >>>> shutting >>>> down, so the replicas won't receive it and obviously will not replay >>>> it; >>>> or simply that they are behind enough that they receive it but don't >>>> replay it. >>>> You could use pg_controldata to read the last checkpoint info from >>>> all >>>> nodes. You can run it on the primary after shutting it down, and >>>> then >>>> on each replica while it's still running to ensure that the correct >>>> restartpoint has been created.