WAL replay stuck in hot standby when performing the backup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

  I have noticed that WAL replay blocks in a hot standby slave when performing the backup.
I am using on both PostgreSQL 9.4.23 on Redhat.

During the whole backup procedure (which takes more than 24 hours), the data onto the hot standby node is not updated.

Is this behaviour correct?

My idea is to confine batch workloads, backups and less critical OLTP queries on the hot standby.
If during the backup live data is not updated in the hot standby node, I have to re-plan the environment.

Details:

- Replication statistics show that replay location is lagging far behind the flush location (on master side):

  select * from pg_stat_replication;

    pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
  -------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
   15477 |       10 | postgres | walreceiver      | 127.0.0.1   |                 |       50612 | 2019-07-15 17:34:36.759306+00 |   1152223785 | streaming | 274D/804CBCB8 | 274D/804CBCB8  | 274D/804CBCB8  | 2744/30F84028   |             0 | async
  (1 row)

- Replay lag is about 19 hours and running (on hot standby side):

  select now()-pg_last_xact_replay_timestamp() as replication_lag;

   replication_lag 
  -----------------
   19:12:11.2289
  (1 row)

  SELECT
  pg_last_xlog_receive_location() receive,
  pg_last_xlog_replay_location() replay,
  (
   extract(epoch FROM now()) -
   extract(epoch FROM pg_last_xact_replay_timestamp())
  )::int lag;

      receive    |    replay     |  lag  
  ---------------+---------------+-------
   274E/BAF0B610 | 2744/30F84028 | 69157
  (1 row)

  The only query currently running (not in idle state) on the slave is a COPY:

  select * from pg_stat_activity;

   datid |   datname   |  pid  | usesysid |    usename     |    application_name    | client_addr  | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | waiting | state  | backend_xid | backend_xmin |                                                                                                              query                                                                                                             
  -------+-------------+-------+----------+----------------+------------------------+--------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+--------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   16993 | mydb        | 15604 |  3226933 | rousr          | PostgreSQL JDBC Driver | 10.192.87.91 |                 |       41491 | 2019-07-15 17:35:57.01458+00  |                               | 2019-07-16 12:55:58.21023+00  | 2019-07-16 12:55:58.210251+00 | f       | idle   |             |              | COMMIT
   16993 | mydb        | 18868 |  3226603 | amoretto       | psql93                 | 10.174.62.66 |                 |       54106 | 2019-07-16 12:59:30.122865+00 | 2019-07-16 13:00:18.807889+00 | 2019-07-16 13:00:18.807889+00 | 2019-07-16 13:00:18.807893+00 | f       | active |             |   1152301732 | select * from pg_stat_activity;
   16993 | mydb        |  2559 |  3226603 | amoretto       | psql93                 | 10.174.62.66 |                 |       52338 | 2019-07-16 10:57:03.118782+00 |                               | 2019-07-16 10:58:27.278095+00 | 2019-07-16 10:58:27.279752+00 | f       | idle   |             |              | select * from pg_stat_replication;
   16993 | mydb        | 15736 |  3226603 | amoretto       | pg_dump                | 10.192.87.90 |                 |        4281 | 2019-07-15 17:36:42.540157+00 | 2019-07-15 17:36:42.566704+00 | 2019-07-16 12:51:04.488305+00 | 2019-07-16 12:51:04.488307+00 | f       | active |             |   1152223785 | COPY public.c_box_hw_dev_app_hist (box_id, hardware_id, device_id, application_id, sequence, history_time, history_info, usr_id, usr_info, archived, is_history_info_zip, description, parent_application_name, iid) TO stdout;
(4 rows)

  select * from pg_stat_database_conflicts;
   datid |   datname   | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock 
  -------+-------------+------------------+------------+----------------+-----------------+----------------
       1 | template1   |                0 |          0 |              0 |               0 |              0
   12998 | template0   |                0 |          0 |              0 |               0 |              0
   13003 | postgres    |                0 |          0 |              0 |               0 |              0
   16993 | mydb        |                0 |          0 |              0 |               0 |              0
  (4 rows)

  select * from pg_stat_database;
   datid |   datname   | numbackends | xact_commit | xact_rollback | blks_read | blks_hit | tup_returned | tup_fetched | tup_inserted | tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks | blk_read_time | blk_write_time |          stats_reset          
  -------+-------------+-------------+-------------+---------------+-----------+----------+--------------+-------------+--------------+-------------+-------------+-----------+------------+------------+-----------+---------------+----------------+-------------------------------
       1 | template1   |           0 |           0 |             0 |         0 |        0 |            0 |           0 |            0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |              0 | 
   12998 | template0   |           0 |           0 |             0 |         0 |        0 |            0 |           0 |            0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |              0 | 
   13003 | postgres    |           0 |           0 |             0 |         0 |        0 |            0 |           0 |            0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |              0 | 
   16993 | mydb        |           3 |        2710 |             2 |      2499 |   333071 |       213588 |       72658 |            0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |              0 | 2019-07-15 17:31:51.600354+00
  (4 rows)

  select * from pg_stat_bgwriter;
   checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_backend_fsync | buffers_alloc |          stats_reset         
  -------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+-------------------------------
                4636 |               1 |               1107064 |                    9 |             378395 |             0 |                0 |          460498 |                     0 |       1432887 | 2019-07-15 17:31:22.319099+00
  (1 row)

I hope that someone can give me some hint on this topic, it would be appreciated!

Thanks in advance,

  Andrea

Andrea Moretto
moretto.andrea@xxxxxxxxx
-------------------------------------------------------
CONFIDENTIALITY NOTICE
This message and its attachments are addressed solely to the persons
above and may contain confidential information. If you have received
the message in error, be informed that any use of the content hereof
is prohibited. Please return it immediately to the sender and delete
the message.
-------------------------------------------------------







[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux