On Sat, Jan 20, 2024, 5:43 PM Chris Angelico <rosuav@xxxxxxxxx> wrote:
PostgreSQL 15 on Debian, both ends of replication.
I'm doing logical replication in a bit of a complex setup. Not sure
how much of this is relevant so I'll give you a lot of detail; sorry
if a lot of this is just noise.
* Bidirectional alternating-master replication. Since I'm still on PG
15, the replication is up in one direction, down in the other, rather
than actually being simultaneous.
* Replication is defined as "for all tables". All relevant tables are
in a single schema, "stillebot".
* Replication was working fine on initial deployment, including a swap
of master/slave.
* One table was created without a primary key, and subsequently
altered to have "replica identity full".
* Replication is not working for this table (stillebot.settings) but
is working for other tables.
I tried restarting Postgres on the slave and monitoring
/var/log/postgresql/postgresql-15-main.log and it all seemed happy.
According to pg_subscription_rel, all tables are in their
"replication" phase:
select srsubstate,srsublsn,relname from pg_subscription_rel join
pg_class on srrelid=oid;
srsubstate | srsublsn | relname
------------+----------+--------------------------
r | | user_followed_categories
r | | config
r | | config_exportable
r | | commands
r | | settings
(5 rows)
If I make a change to user_followed_categories, it works fine. If I
make a change to settings, it does not replicate.
The stillebot.config table seems somewhat desynchronized - rows are
missing - but if I INSERT a new row into it, it gets properly
replicated.
How can I dig into this to determine what's going on? Have I broken
things by creating a table without a primary key? (It's a singleton
table, will only ever have one row in it; could add a meaningless PK
if it helps the replication.) Are there other logs to inspect?
Thanks in advance!
ChrisA
When using replica set to full this kicks off a full table scan for each update or delete this is very expensive. If there are no errors being reported you will find it is working but hung doing full scans. Inserts are just appended to end of heap.
You can copy the replication slot on the primary to peak at which transactions LR is hung on to confirm.
Adding a primary key will fix this issue. Note PG 16 can use indexes to find qualifying rows when a table's replica is set to full.