Hello,
I've learned that logical replication might have performance problem if there are lot of sub transactions within transaction (at least because it enforces spill files in pg_replslot and if there are many - like 80mio, EXT4 did not perform well - in my case it was caused bu misuse of exception handling in trigger function) and once we get rid of this usage pattern logical replication works like a charm.
Until we have applied schema migration patch creating 700+ tables by flyway, therefore whole file was applied in single transaction (like psql -1 -f xxxx) and it effectively blocked our replication for 22 hours. No failures, wal sender processed did not consumed huge amount of memory (what happened in past wiht 80mi sub-trans caused by trigger), all wal senders did IO very rarely, most of the time they dwell on CPU.
The spill files were not so big:
# du -sh xid-924361123-lsn-FBC-C7000000.snap
224K xid-924361123-lsn-FBC-C7000000.snap
224K xid-924361123-lsn-FBC-C7000000.snap
It takes about 1 minute (all the time 100% CPU used by was sender) to move to following file.
Version, OS:
PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
Perf:
Overhead Command Shared Object Symbol
57.79% postgres postgres [.] hash_seq_search
14.70% postgres postgres [.] hash_search_with_hash_value
5.93% postgres postgres [.] LocalExecuteInvalidationMessage
3.60% postgres postgres [.] RelfilenodeMapInvalidateCallback
2.99% postgres postgres [.] hash_uint32
2.50% postgres postgres [.] hash_search
1.91% postgres postgres [.] CatCacheInvalidate
1.83% postgres postgres [.] CallSyscacheCallbacks
1.37% postgres postgres [.] hash_seq_init
1.24% postgres postgres [.] hash_seq_term
1.02% postgres postgres [.] RelationCacheInvalidateEntry
0.97% postgres postgres [.] ReorderBufferCommit
0.97% postgres pgoutput.so [.] rel_sync_cache_relation_cb
0.78% postgres postgres [.] uint32_hash
0.73% postgres postgres [.] PlanCacheRelCallback
0.60% postgres postgres [.] InvalidateCatalogSnapshot
0.58% postgres postgres [.] SysCacheInvalidate
0.35% postgres postgres [.] GetCurrentTransactionNestLevel
0.13% postgres pgoutput.so [.] _init
0.00% postgres [kernel.kallsyms] [k] __do_softirq
0.00% postgres [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.00% postgres postgres [.] ResourceOwnerEnlargeRelationRefs
57.79% postgres postgres [.] hash_seq_search
14.70% postgres postgres [.] hash_search_with_hash_value
5.93% postgres postgres [.] LocalExecuteInvalidationMessage
3.60% postgres postgres [.] RelfilenodeMapInvalidateCallback
2.99% postgres postgres [.] hash_uint32
2.50% postgres postgres [.] hash_search
1.91% postgres postgres [.] CatCacheInvalidate
1.83% postgres postgres [.] CallSyscacheCallbacks
1.37% postgres postgres [.] hash_seq_init
1.24% postgres postgres [.] hash_seq_term
1.02% postgres postgres [.] RelationCacheInvalidateEntry
0.97% postgres postgres [.] ReorderBufferCommit
0.97% postgres pgoutput.so [.] rel_sync_cache_relation_cb
0.78% postgres postgres [.] uint32_hash
0.73% postgres postgres [.] PlanCacheRelCallback
0.60% postgres postgres [.] InvalidateCatalogSnapshot
0.58% postgres postgres [.] SysCacheInvalidate
0.35% postgres postgres [.] GetCurrentTransactionNestLevel
0.13% postgres pgoutput.so [.] _init
0.00% postgres [kernel.kallsyms] [k] __do_softirq
0.00% postgres [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.00% postgres postgres [.] ResourceOwnerEnlargeRelationRefs
My suspect is that psql (of flyway) while processing a file with many create table and grant statements in single transaction mode might create sub transactions, but I haven't found it in docs.
Does you have similar experience? What is root cause for this issue?
Workaround is clear - do not create that large amount of DDL in single transaction, but it would be nice to know root cause to eliminate scripts/usage patterns causing this issue generally.
On the other hand, following transaction (100GB of spill files) , almost 150 mio rows deleted (and archived by statement trigger) behaves expected way - wal sender process was reading from disk (something like 130MB/s) and after ~20 minutes replication was in sync, all spill files were removed, so transaction changes volume itself does not seems to be any issue.
Thanks for any advice!
Kind regards Ales Zeleny
P.S.: PostgreSQL log facility is superb, having DDL statements logged and XID in log line prefix I was able to identify the issue source in this case.