Logical Replication disconnecting and taking a while to return back.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi! I hope you are well.
I'm trying to understand a (strange, I think) behavior here when we using logical replication in PostgreSQL 14.11.   This only happens when using the streaming = on parameter on the subscription.  If the value is off, the problem doesn't happen.
This happens in the AWS RDS PostgreSQL 14.11, but I reproduced the same behavior in my local environment using MacOS and Docker (Rancher).
Below, are the steps that are used to reproduce this behavior:
  • Create a logical replication using a publication/subscription pair between two different PostgreSQL 14.11 instances on a different host, using the option streaming = on in the subscription;
  • Do a large update in a long transaction on a table or tables controlled by this subscription.  You must do a huge update, that will cause an autovacuum, with an index cleanup phase, after the transaction commit;
  • In this case, the replication slot will go down and take a while to return back.  I will show some pieces of logs below;
    • This appears on the publicator's log file when replication slot stops, by a walsender timeout:
      2024-06-27 10:47:19 UTC:10.65.xx.131(24376):replicator@DB20:[17466]:LOG:  terminating walsender process due to replication timeout
      2024-06-27 10:47:19 UTC:10.65.xx.131(24376):replicator@DB20:[17466]:STATEMENT:  START_REPLICATION SLOT "subs_rept_opr1_02" LOGICAL 1AA6/3D6BC570 (proto_version '2', streaming 'on', publication_names '"publi_rept_opr1_02"')
      2024-06-27 10:47:19 UTC:10.65.xx.131(24376):replicator@DB20:[17466]:LOG:  disconnection: session time: 1650:28:29.615 user=replicator database=DB20 host=10.65.xx.131 port=24376

    • At this time, we can't see any messages on the subscriber log files.   The messages only appear on subscriber log files when replication comes back online again, like the subscriber piece of log below:
      2024-06-27 12:26:01 UTC::@:[25739]:ERROR:  could not receive data from WAL stream: SSL connection has been closed unexpectedly
      2024-06-27 12:26:01 UTC::@:[396]:LOG:  background worker "logical replication worker" (PID 25739) exited with exit code 1
      2024-06-27 12:26:01 UTC::@:[7325]:LOG:  logical replication apply worker for subscription "subs_rept_opr1_02" has started

      2024-06-27 12:39:49 UTC::@:[7325]:ERROR:  could not receive data from WAL stream: server closed the connection unexpectedly
                      This probably means the server terminated abnormally
                      before or while processing the request.
              server closed the connection unexpectedly
                      This probably means the server terminated abnormally
                      before or while processing the request.
              server closed the connection unexpectedly
                      This probably means the server terminated abnormally
                      before or while processing the request.
      2024-06-27 12:39:49 UTC::@:[396]:LOG:  background worker "logical replication worker" (PID 7325) exited with exit code 1
      2024-06-27 12:39:49 UTC::@:[8881]:LOG:  logical replication apply worker for subscription "subs_rept_opr1_02" has started

  • Remember this only happens if the option streaming = on in the related subscription.  I tested the same behavior using streaming = off (default) and the problem didn't happen.
  • I'm not sure, and I don't know if this can be related to the autovacuum process running after the long update commit.  The picture below shows the autovacuum phases (on our monitoring tool) in the updated table.
    image.png
  • Logical replication remains stopped from 2024-06-27 10:47:19 UTC to 2024-06-27 12:26:01 UTC, without any new messages in the log files;
  • At 12:39:49 UTC we had problems again, but the logical replication recovery was shorter;
  • No replication data was lost in the process.
I would appreciate if you could help or advise me on understanding what happened here.
All my best!!

Lucio Chiessi

Staff Database Administrator     

Trustly, Inc.

M: +55 27 996360276

  

    


Please read our privacy policy here on how we process your personal data in accordance with the General Data Protection Regulation (EU) 2016/679 (the “GDPR”) and other applicable data protection legislation.

[Index of Archives]     [Postgresql Home]     [Postgresql General]     [Postgresql Performance]     [Postgresql PHP]     [Postgresql Jobs]     [PHP Users]     [PHP Databases]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Yosemite Forum]

  Powered by Linux