OK, that makes sense. I think something that is unique to subscribers is sensible, postmaster startup time sounds reasonable!
Thanks for looking at it.
On Thu, Mar 23, 2023 at 8:17 AM Kyotaro Horiguchi <horikyota.ntt@xxxxxxxxx> wrote:
At Wed, 22 Mar 2023 09:25:37 +0000, Will Roper <will.roper@xxxxxxxxxxxxxxxxxxxx> wrote in
> Thanks for the response Hou,
>
> I've had a look and when the tablesync workers are spinning up there are
> some errors of the form:
>
> "2023-03-17 18:37:06.900 UTC [4071] LOG: logical replication table
> synchronization worker for subscription
> ""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has
> started"
> "2023-03-17 18:37:06.976 UTC [4071] ERROR: could not create replication
> slot ""pg_37986_sync_37922_7210774007126708177"": ERROR: replication slot
> ""pg_37986_sync_37922_7210774007126708177"" already exists"
The slot name format is "pg_<suboid>_sync_<relid>_<systemid>". It's no
surprise this happens if the subscribers come from the same
backup.
If that's true, the simplest workaround would be to recreate the
subscription multiple times, using a different number of repetitions
for each subscriber so that the subscribers have subscriptions with
different OIDs.
I believe it's not prohitibed for subscribers to have the same system
identifer, but the slot name generation logic for tablesync doesn't
account for cases like this. We might need some server-wide value
that's unique among subscribers and stable while table sync is
running. I can't think of a better place than pg_subscription but I
don't like it because it's not really necessary most of the the
subscription's life.
Do you think using the postmaster's startup time would work for this
purpose? I'm assuming that the slot name doesn't need to persist
across server restarts, but I'm not sure that's really true.
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 07eea504ba..a5b4f7cf7c 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -1214,7 +1214,7 @@ ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
char *syncslotname, Size szslot)
{
snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid,
- relid, GetSystemIdentifier());
+ relid, PgStartTime);
}
/*
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center