pá 22. 3. 2019 v 10:49 odesílatel Aleš Zelený <zeleny.ales@xxxxxxxxx> napsal:
Hello,I've learned that logical replication might have performance problem if there are lot of sub transactions within transaction (at least because it enforces spill files in pg_replslot and if there are many - like 80mio, EXT4 did not perform well - in my case it was caused bu misuse of exception handling in trigger function) and once we get rid of this usage pattern logical replication works like a charm.Until we have applied schema migration patch creating 700+ tables by flyway, therefore whole file was applied in single transaction (like psql -1 -f xxxx) and it effectively blocked our replication for 22 hours. No failures, wal sender processed did not consumed huge amount of memory (what happened in past wiht 80mi sub-trans caused by trigger), all wal senders did IO very rarely, most of the time they dwell on CPU.The spill files were not so big:# du -sh xid-924361123-lsn-FBC-C7000000.snap
224K xid-924361123-lsn-FBC-C7000000.snapIt takes about 1 minute (all the time 100% CPU used by was sender) to move to following file.Version, OS:PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bitPerf:Overhead Command Shared Object Symbol
57.79% postgres postgres [.] hash_seq_search
14.70% postgres postgres [.] hash_search_with_hash_value
5.93% postgres postgres [.] LocalExecuteInvalidationMessage
3.60% postgres postgres [.] RelfilenodeMapInvalidateCallback
2.99% postgres postgres [.] hash_uint32
2.50% postgres postgres [.] hash_search
1.91% postgres postgres [.] CatCacheInvalidate
1.83% postgres postgres [.] CallSyscacheCallbacks
1.37% postgres postgres [.] hash_seq_init
1.24% postgres postgres [.] hash_seq_term
1.02% postgres postgres [.] RelationCacheInvalidateEntry
0.97% postgres postgres [.] ReorderBufferCommit
0.97% postgres pgoutput.so [.] rel_sync_cache_relation_cb
0.78% postgres postgres [.] uint32_hash
0.73% postgres postgres [.] PlanCacheRelCallback
0.60% postgres postgres [.] InvalidateCatalogSnapshot
0.58% postgres postgres [.] SysCacheInvalidate
0.35% postgres postgres [.] GetCurrentTransactionNestLevel
0.13% postgres pgoutput.so [.] _init
0.00% postgres [kernel.kallsyms] [k] __do_softirq
0.00% postgres [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.00% postgres postgres [.] ResourceOwnerEnlargeRelationRefsMy suspect is that psql (of flyway) while processing a file with many create table and grant statements in single transaction mode might create sub transactions, but I haven't found it in docs.
Maybe you have disabled autocommit with ON_ERROR_ROLLBACK on
I don't know about another case, where psql can use subtransactions.
Pavel
Does you have similar experience? What is root cause for this issue?Workaround is clear - do not create that large amount of DDL in single transaction, but it would be nice to know root cause to eliminate scripts/usage patterns causing this issue generally.On the other hand, following transaction (100GB of spill files) , almost 150 mio rows deleted (and archived by statement trigger) behaves expected way - wal sender process was reading from disk (something like 130MB/s) and after ~20 minutes replication was in sync, all spill files were removed, so transaction changes volume itself does not seems to be any issue.Thanks for any advice!Kind regards Ales ZelenyP.S.: PostgreSQL log facility is superb, having DDL statements logged and XID in log line prefix I was able to identify the issue source in this case.