Fair question, I should have been more elaborate.
When I have the creation of a replication slot, before I insert any data into the table in my example (the SQL-file named "broken" in my gist), after I then create the subscription on the receiving side, it does the initial sync of data and then the replication fails with these messages repeating on the receiving server:
[67] LOG: logical replication apply worker for subscription "widgets_sub" has started
[67] DEBUG: connecting to publisher using connection string "dbname=testdb host=master port=5432 user=repuser password=abc123"
[67] ERROR: could not receive data from WAL stream: ERROR: publication "widgets_pub" does not exist
CONTEXT: slot "my_slot", output plugin "pgoutput", in the change callback, associated LSN 0/1674958
[1] DEBUG: unregistering background worker "logical replication worker for subscription 16396"
[1] LOG: background worker "logical replication worker" (PID 67) exited with exit code 1
[67] DEBUG: connecting to publisher using connection string "dbname=testdb host=master port=5432 user=repuser password=abc123"
[67] ERROR: could not receive data from WAL stream: ERROR: publication "widgets_pub" does not exist
CONTEXT: slot "my_slot", output plugin "pgoutput", in the change callback, associated LSN 0/1674958
[1] DEBUG: unregistering background worker "logical replication worker for subscription 16396"
[1] LOG: background worker "logical replication worker" (PID 67) exited with exit code 1
I can verify that the publication called widgets_pub does exist, and I am not seeing any errors on the sending side.
The SQL-file named "works" just has the creation of the replication slot moved down to after I insert some data into the table I want to replicate. And that works as expected.
On Wed, 20 May 2020 at 15:00, Peter Eisentraut <peter.eisentraut@xxxxxxxxxxxxxxx> wrote:
On 2020-05-19 09:22, Anders Bøgh Bruun wrote:
> I have run into a (to me) weird issue with logical replication. We are
> running Zalandos postgres-operator in our Kubernetes clusters and have
> recently had a use-case where we wanted to start doing logical
> replication of select tables to a data warehouse, also running postgres.
> It worked as expected at first, but then after a pod-restart in
> Kubernetes, the replication slots that were created for the subscription
> were gone. A bit of reading later, and I learn we need to tell Patroni
> which slots should be permanently available, so we specify a slot and
> try to set this up, but then run into an error which says the
> publication does not exist, even though we can verify that it does. At
> first I suspected Patroni handling the replication slots to be the cause
> of the problem, but about a week's worth of learning and experimenting
> later, I can now reliably replicate the problem in pure postgres.
> Patroni is kind of the catalyst, since my findings are that if the
> replication slot is created before data is inserted into the source
> database, and a publication is created, then it breaks. If the
> replication slot is created after data is inserted and the publication
> is created, then it works. We just can't tell Patroni to not create it
> until some arbitrary point in time. I am guessing this is either a bug
> or a case of us not knowing what we are doing...
What does "works" and "breaks" mean? Are there error messages shown
from commands or in the server logs? Is replication progressing, but
doing something your are not expecting? etc.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Anders Bøgh Bruun
Infrastructure Architect
CellPoint
digital
cellpointdigital.com
WE
MAKE TRAVEL EASIER™
M: +45 31 14 87 41
Chicago | Copenhagen | Dubai | London | Miami | Pune | Singapore