Hi all. I’m wondering why do I get conflicts with recovery on hot standby using replication slots and read commited isolation level? And if I start repeatable read transaction I don’t get any errors. Below is some diagnostics. I’m using 9.4.4 (upgrade is planned) from yum.postgresql.org packages on both master and standby. Configs are the same on both master and standby: rpopdb01d/postgres M # SELECT name, setting FROM pg_settings WHERE category LIKE 'Replication%' or category LIKE 'Write-Ahead Log'; name | setting ------------------------------+--------- hot_standby | on hot_standby_feedback | on max_replication_slots | 1 max_standby_archive_delay | 30000 max_standby_streaming_delay | 30000 max_wal_senders | 10 synchronous_standby_names | vacuum_defer_cleanup_age | 200000 wal_keep_segments | 64 wal_receiver_status_interval | 1 wal_receiver_timeout | 60000 wal_sender_timeout | 3000 (12 rows) Time: 1.583 ms rpopdb01d/postgres M # On the master I’ve created a physical replication slot and attached standby to it, I do see changing xmin and restart_lsn fields in pg_replication_slots view. rpopdb01d/postgres M # select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn ----------------------+--------+-----------+--------+----------+--------+------------+--------------+--------------- rpopdb01e_domain_com | [null] | physical | [null] | [null] | t | 2127399287 | [null] | 960B/415C79C8 (1 row) Time: 0.463 ms rpopdb01d/postgres M # When I start a read commited transaction on standby (or use autocommit mode, doesn’t matter) I still see that xmin in pg_replication_slots view on master increases. If I do run a heavy SELECT statement, at some point of time (presumably after vacuum_defer_cleanup_age expires) standby starts to lag replication apply and when it hits max_standby_streaming_delay I get 40001 sql code, either ERROR or FATAL: rpopdb01e/rpopdb R # SHOW transaction_isolation ; transaction_isolation ----------------------- read committed (1 row) Time: 0.324 ms rpopdb01e/rpopdb R # SELECT count(*) FROM big_table; ERROR: 40001: canceling statement due to conflict with recovery DETAIL: User query might have needed to see row versions that must be removed. LOCATION: ProcessInterrupts, postgres.c:2990 Time: 199791.339 ms rpopdb01e/rpopdb R # rpopdb01e/rpopdb R # SHOW transaction_isolation ; transaction_isolation ----------------------- read committed (1 row) Time: 0.258 ms rpopdb01e/rpopdb R # BEGIN; BEGIN Time: 0.067 ms rpopdb01e/rpopdb R # SELECT count(*) FROM big_table; FATAL: 40001: terminating connection due to conflict with recovery DETAIL: User was holding a relation lock for too long. LOCATION: ProcessInterrupts, postgres.c:2857 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. Time: 307864.830 ms rpopdb01e/rpopdb R # The behavior is the same as expected to be without using replication slots. But when I start repeatable read transaction xmin field in pg_replication_slots view on master freezes (while restart_lsn is still increasing) and I don’t get any replication lag and conflicts with recovery. When I end this transaction, xmin starts increasing again. rpopdb01e/rpopdb R # begin transaction isolation level repeatable read; BEGIN Time: 0.118 ms rpopdb01e/rpopdb R # SELECT count(*) FROM big_table; count ------------ 3106222429 (1 row) Time: 411944.889 ms rpopdb01e/rpopdb R # ROLLBACK; ROLLBACK Time: 0.269 ms rpopdb01e/rpopdb R # And that is what I expect. Am I missing something or is it expected behavior in read commited mode? Thanks in advance. |