Hey folks!
We've been having some issues with logical replication.
After some time running and streaming messages
to 1 client, it seemingly hangs.
We don't see the sent_lsn value
increase anymore, so the client itself doesn't seem to ever get new messages and thus doesn't send any feedback.
When we close the client application, making sure we close the connection client side, sometimes the backend stays alive and doesn't allow us to drop the
logical replication slot unless we pg_terminate_backend it.
Even then, most of the time trying to drop the replication slot returns
"active for PID foo" error
even after pg_terminate_backend(foo) returned
true, multiple times.
Before the replication hangs, we've seen normally we keep getting the same wal_end
LSN
in the replication messages for a few,
which could indicate we don't have anything coming into the replication slot. Is this assumption correct?
This server also has a physical replication
set up with 1 replica, which is one of the differences we have with a non-physically replicated instance that we're using to compare, and which doesn't seem to surface this issue.
I'm looking for some guidance or ideas about what to investigate further to sort out this behavior.
Thanks in advance.
|