Reproducer on 12.4: On the primary: CREATE TABLE repli (id bigint PRIMARY KEY, val text NOT NULL); CREATE PUBLICATION repli_pub FOR TABLE repli; On the standby: CREATE TABLE repli (id bigint PRIMARY KEY, val text NOT NULL); CREATE SUBSCRIPTION repli_sub CONNECTION '...' PUBLICATION repli_pub; Logical replication works fine. Now connect to the primary and run: BEGIN; SELECT format('CREATE TABLE table%s (id SERIAL PRIMARY KEY, somename text UNIQUE);', foo) FROM generate_series(1,13200) AS foo \gexec 13200 tables are created. Don't commit the transaction yet and verify that logical replication is still working. Now commit the transaction; logical replication stops working. The log on the primary has lots of: 2020-08-21 12:41:47.899 CEST [25222] LOG: connection received: host=[local] 2020-08-21 12:41:47.900 CEST [25222] LOG: replication connection authorized: user=postgres application_name=repli_sub 2020-08-21 12:41:47.901 CEST [25222] ERROR: replication slot "repli_sub" is active for PID 23911 The standby has 2020-08-21 12:41:47.897 CEST [25221] LOG: logical replication apply worker for subscription "repli_sub" has started 2020-08-21 12:41:47.901 CEST [25221] ERROR: could not start WAL streaming: ERROR: replication slot "repli_sub" is active for PID 23911 2020-08-21 12:41:47.903 CEST [22901] LOG: background worker "logical replication worker" (PID 25221) exited with exit code 1 When I try a fast shutdown on the primary server, the WAL sender just doesn't stop, it can only be killed with SIGKILL. I set "wal_sender_timeout = 6000s" and "log_min_messages = debug3" on the primary and "wal_receiver_timeout = 6000s" on the standby and restarted the servers. I see messages like this for the WAL sender: 2020-08-21 13:53:34.321 CEST [33594] LOG: connection received: host=[local] 2020-08-21 13:53:34.321 CEST [33594] DEBUG: postmaster child[33594]: starting with ( 2020-08-21 13:53:34.321 CEST [33594] DEBUG: postgres 2020-08-21 13:53:34.321 CEST [33594] DEBUG: ) 2020-08-21 13:53:34.321 CEST [33594] DEBUG: InitPostgres 2020-08-21 13:53:34.321 CEST [33594] LOG: replication connection authorized: user=postgres application_name=repli_sub 2020-08-21 13:53:34.326 CEST [33594] DEBUG: received replication command: IDENTIFY_SYSTEM 2020-08-21 13:53:34.326 CEST [33594] DEBUG: received replication command: START_REPLICATION SLOT "repli_sub" LOGICAL 28/48643248 (proto_version '1', publication_names '"repli_pub"') 2020-08-21 13:53:34.326 CEST [33594] DEBUG: cannot stream from 28/48643248, minimum is 28/48705180, forwarding 2020-08-21 13:53:34.326 CEST [33594] DEBUG: find_in_dynamic_libpath: trying "/usr/pgsql-12/lib/pgoutput" 2020-08-21 13:53:34.326 CEST [33594] DEBUG: find_in_dynamic_libpath: trying "/usr/pgsql-12/lib/pgoutput.so" 2020-08-21 13:53:34.326 CEST [33594] LOG: starting logical decoding for slot "repli_sub" 2020-08-21 13:53:34.326 CEST [33594] DETAIL: Streaming transactions committing after 28/48705180, reading WAL from 28/322FEDB8. 2020-08-21 13:53:34.326 CEST [33594] DEBUG: switched to timeline 1 valid until 0/0 2020-08-21 13:53:34.326 CEST [33594] DEBUG: sending replication keepalive 2020-08-21 13:53:34.326 CEST [33594] LOG: logical decoding found consistent point at 28/322FEDB8 2020-08-21 13:53:34.326 CEST [33594] DETAIL: There are no running transactions. 2020-08-21 13:53:34.326 CEST [33594] DEBUG: write 28/48705180 flush 28/48705180 apply 28/48705180 reply_time 2020-08-21 13:53:34.326751+02 2020-08-21 13:53:34.335 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:34.349 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:34.362 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:34.377 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:34.391 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:34.406 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk [...] spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:38.388 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:38.405 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:38.423 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk 2020-08-21 13:53:38.438 CEST [33594] DEBUG: purged committed transactions from 0 to 0, xmin: 191852506, xmax: 191852506 2020-08-21 13:53:38.438 CEST [33594] DEBUG: xmin: 191852506, xmax: 191852506, oldest running: 191852506, oldest xmin: 191852506 2020-08-21 13:53:38.439 CEST [33594] DEBUG: purged committed transactions from 0 to 0, xmin: 191852506, xmax: 191852506 2020-08-21 13:53:38.439 CEST [33594] DEBUG: xmin: 191852506, xmax: 191852506, oldest running: 191852506, oldest xmin: 191852506 2020-08-21 13:53:38.501 CEST [33594] DEBUG: found top level transaction 191852506, with catalog changes 2020-08-21 13:53:38.501 CEST [33594] DEBUG: adding a new snapshot to 191852506 at 28/48705180 2020-08-21 13:53:38.766 CEST [33594] DEBUG: spill 1953 changes in XID 191852506 to disk 2020-08-21 13:54:12.207 CEST [33594] DEBUG: restored 4096/976801 changes from disk 2020-08-21 13:54:48.343 CEST [33594] DEBUG: restored 4096/976801 changes from disk 2020-08-21 13:55:25.801 CEST [33594] DEBUG: restored 4096/976801 changes from disk 2020-08-21 13:56:00.470 CEST [33594] DEBUG: restored 4096/976801 changes from disk 2020-08-21 13:56:33.711 CEST [33594] DEBUG: restored 4096/976801 changes from disk [...] This just keeps going with a message roughly every 30 seconds. A typical stack trace of the WAL sender looks like this: #0 0x00000000008c7d87 in hash_seq_search (status=status@entry=0x7ffc4e5446e0) at dynahash.c:1428 #1 0x00000000008b54c8 in RelfilenodeMapInvalidateCallback (arg=<optimized out>, relid=386381) at relfilenodemap.c:79 #2 0x00000000008a6786 in LocalExecuteInvalidationMessage (msg=0x7ff297f62b38) at inval.c:595 #3 0x0000000000750ba6 in ReorderBufferExecuteInvalidations (rb=<optimized out>, txn=<optimized out>) at reorderbuffer.c:2149 #4 ReorderBufferCommit (rb=0x1b270a0, xid=xid@entry=191852506, commit_lsn=173014012288, end_lsn=<optimized out>, commit_time=commit_time@entry=651321281503986, origin_id=origin_id@entry=0, origin_lsn=0) at reorderbuffer.c:1770 #5 0x0000000000746a0a in DecodeCommit (xid=191852506, parsed=0x7ffc4e544aa0, buf=0x7ffc4e544c50, ctx=0x1b11120) at decode.c:640 #6 DecodeXactOp (ctx=0x1b11120, buf=buf@entry=0x7ffc4e544c50) at decode.c:248 #7 0x0000000000746e22 in LogicalDecodingProcessRecord (ctx=0x1b11120, record=0x1b11398) at decode.c:117 #8 0x00000000007581e5 in XLogSendLogical () at walsender.c:2848 #9 0x000000000075a3f3 in WalSndLoop (send_data=send_data@entry=0x758180 <XLogSendLogical>) at walsender.c:2199 #10 0x000000000075b12c in StartLogicalReplication (cmd=0x7ffc4e544d20) at walsender.c:1129 #11 exec_replication_command ( cmd_string=cmd_string@entry=0x1a425f0 "START_REPLICATION SLOT \"repli_sub\" LOGICAL 28/48643248 (proto_version '1', publication_names '\"repli_pub\"')") at walsender.c:1545 #12 0x00000000007a72d8 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1a99298, dbname=<optimized out>, username=<optimized out>) at postgres.c:4243 #13 0x0000000000734ef2 in BackendRun (port=0x1a90520) at postmaster.c:4448 #14 BackendStartup (port=0x1a90520) at postmaster.c:4139 #15 ServerLoop () at postmaster.c:1704 #16 0x0000000000735df5 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1a3c1e0) at postmaster.c:1377 #17 0x00000000004cd6a1 in main (argc=3, argv=0x1a3c1e0) at main.c:228 perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres + 8.10% [.] hash_seq_search + 4.19% [.] LocalExecuteInvalidationMessage + 3.49% [.] hash_search_with_hash_value + 3.37% [.] CatCacheInvalidate + 1.65% [.] CallSyscacheCallbacks + 0.93% [.] uint32_hash + 0.82% [.] ReorderBufferCommit + 0.79% [.] hash_search + 0.76% [.] RelfilenodeMapInvalidateCallback 0.63% [.] InvalidateCatalogSnapshot + 0.62% [.] SysCacheInvalidate What could be causing this? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com