Hi pgsql-admins, We observed an instance of growing replication delay (measured using pg_current_wal_lsn() - confirmed_flush_lsn) for a logical replication slot used by Debezium that strongly correlated with autovacuum activity. Logical replication delay started growing after a burst in deletes, performed in batches/transactions of 1,000 rows over a span of 40 minutes. This delete resulted in 68 million dead rows, representing 20% of total rows in the given table, which triggered an autovacuum. After the deletes finished, logical replication delay continued to grow from normal write activity as long as autovacuum was running. During this time WALs grew to 68 GiB. When autovacuum completed almost 3 hours later, logical replication delay immediately started going down, and there was an increase in network traffic to Debezium as replication started catching up. We did not see CPU or I/O saturation during the time that logical replication struggled to keep up. Would this be an indicator to increase logical_decoding_work_mem? If not, what symptoms would indicate a need to increase this parameter? Also, what could cause this correlated behavior? Brainstorming ideas on potential causes: - Does the WAL activity generated by vacuums slow down walsender or logical decoding enough to cause this issue? - Do autovacuum and logical replication share resources? - Does a running vacuum block logical replication altogether for a given table? - Does the presence of dead rows slow down logical replication? - Could the large number of accumulated WAL files slow down logical replication (e.g. due to syscalls taking longer to return)? - Could causality be reversed: instead of autovacuum causing replication delay, did replication delay from the large delete cause autovacuum to not finish until the delete had fully replicated? Thanks, Alex