On 3/8/24 14:50, Steve Baldwin wrote:
Hi,
I'm in the process of migrating a cluster from 15.3 to 16.2. We have a
'zero downtime' requirement so I'm using logical replication to create
the new cluster and then perform the switch in the application.
I have a situation where all but one table have done their initial
copy. The remaining table is the largest (of course), and the
replication slot that is assigned for the copy
(pg_378075177_sync_60067_7343845372910323059) is showing as
'active=false' if I select from pg_replication_slots on the publisher.
I've checked the recent logs for both the publishing cluster and the
subscribing cluster but I can't see any replication errors. I guess I
could have missed them, but it doesn't seem like anything is being
'retried' like I've seen in the past with replication errors.
I've used this mechanism for zero-downtime upgrades multiple times in
the past, and have recently used it to upgrade smaller clusters from
15.x to 16.2 without issue.
The clusters are hosted on AWS RDS, so I have no access to the
servers, but if that's the only way to diagnose the issue, I can
create a support case.
Does anyone have any suggestions as to where I should look for the issue?
Thanks,
Steve
In our setup we're logically replicating a 450G database hosted on real
hardware to an RDS instance.
Multiple times we've had replication simply stop and we could never find
any reason for that on either publisher or subscriber.
The *only* solution that ever worked in these cases was dropping the
subscription in RDS and re-creating it with (copy_data = false).
At that point replication picks right up again for new transactions
*but* at the expense of losing all of the WAL that should have been
replicated during the outage. I wrote a python based "logical
replication fixer" to fill in those gaps.
Given that the subscriber is the one that initiates the connection to
the publisher and that as soon as the subscription is dropped and
restarted replication resumes my hunch is that this is squarely on RDS.
With both publisher and subscriber on RDS as in your case YMMV.
RDS is a black box--who knows what's really going on there? It would be
interesting to see what the response is after you open a support case.
I hope you'll be able to share that with the list.
Jeff