Re: In-place upgrade with streaming replicas

Jerry Sievers <gsievers19@xxxxxxxxxxx> · Thu, 20 Feb 2025 01:49:03 -0600

richard@xxxxxxxxx writes:

> Dear Alvaro,
>
> Thanks for your answers. Unfortunately, I was unaware of a shutdown
> record, that makes a difference then. So, I definitely must stop the
> primary first, then use pg_controldata to obtain checkpoint
> info. Then, can I query the replicas while they are up and running if
> they've received the shutdown record or not? So, after shutting down
> the primary, how will I know if a replica has received the mentioned
> record, and is safe to shutdown?

Hmmm, not sure about that but what we do, is stop primary, wait a
$short time, then stop replicas...

Then run pg_controldata on all nodes | filter out only the line
indicating latest checkpoint and sort -u the output.  Expect only a
single line if all are matched.

You may also wish to first insure that you got the same number of
lines as total node count before doing the sorting and uniqueing.

Very rarely on our huge systems, we'd have a mismatch after the
verification in in those cases, our automated upgrade procedure
restarts all nodes and then does the shutdown and verify check again.

HTH

>
> Thanks for the clarifications.
>
> Best regards,
> Richard
>
> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
>> On 2025-Feb-19, richard@xxxxxxxxx wrote:
>> 
>>> With this, I have the question, that after the shutdown of primary,
>>> what is
>>> the guarantee for replicas having the same checkpoint location? Why
>>> does the
>>> order of shutting down the servers matter? What would be the really
>>> exact
>>> and reliable way to ensure that replicas will have the same checkpoint
>>> location as the primary?
>> The replicas can't write WAL by themselves, but they will replay
>> whatever the primary has sent; by shutting down the primary first and
>> letting the replicas catch up, you ensure that the replicas will
>> actually receive the shutdown record and replay it.  If you shut down
>> the replicas first, they can obviously never catch up with the shutdown
>> checkpoint of the primary.
>> As I recall, if you do shut down the primary first, one potential
>> danger
>> is that the primary fails to send the checkpoint record before shutting
>> down, so the replicas won't receive it and obviously will not replay
>> it;
>> or simply that they are behind enough that they receive it but don't
>> replay it.
>> You could use pg_controldata to read the last checkpoint info from
>> all
>> nodes.  You can run it on the primary after shutting it down, and then
>> on each replica while it's still running to ensure that the correct
>> restartpoint has been created.