Re: High replication lag - Stream Replication

Shreeyansh Dba <shreeyansh2014@xxxxxxxxx> · Mon, 13 Nov 2017 18:29:23 +0530

On Thu, Nov 9, 2017 at 5:36 PM, Marcelo Kruger <marcelo.kruger@xxxxxxxxxxxxx> wrote:
Hi Shreeyansh,

I changed the log_min_duration_statement parameter to -1, but there was no improvement in the database. The application of the archives by Stream Replication is still slow.

In my usage scenario we performed large table creation operations. We create several tables in a short time with more than 40GB. We currently have a 500GB growth per month in the database.

Transactions in one day can generate more than 30000 archives (480GB).

Are there any other parameters that can be changed to improve performance in the application of archives?

2017-11-09 2:43 GMT-02:00 Shreeyansh Dba <shreeyansh2014@xxxxxxxxx>:
On Wed, Nov 8, 2017 at 9:34 PM, Marcelo Kruger <marcelo.kruger@xxxxxxxxxxxxx> wrote:
Hi Shreeyansh,
Thank you for your help.

The lag is not occurring in the recording of the archive in production, nor in sending to StandBy. The lag is occurring in the archive application in StandBy. And this slowness increases with running queries in the StandBy database.

The process of recovering is consuming little CPU, and little I / O.

postgres: startup process   recovering 0000000100001B4300000006  

This parameter log_min_duration_statement should be changed on the StandBy server?

The parameter should be changed On both master as well as slave as this 
result towards heavy load on the server causing slower execution of the 
queries due to higher resource consumption.

This parameter normally 
set capturing bottleneck queries by setting the higher values to 
minimize the write activities and load on the servers.

2017-11-08 12:09 GMT-02:00 Shreeyansh Dba <shreeyansh2014@xxxxxxxxx>:

On Wed, Nov 8, 2017 at 7:08 PM, Marcelo Kruger <marcelo.kruger@xxxxxxxxxxxxx> wrote:
Good afternoon,

I have two servers hosted on Azure with PostgreSQL 9.6 databases. One database is production, the other is standby.
For replication I use stream replication between servers. However the application of the archives in the standby database is slow, always generating replication lag.

The replication lag increases considerably when queries are performed on the StandBy database. I wonder if there is any setting that can be made to improve this performance.

Server Production
SO: CentOS Linux release 7.4.1708 (Core) 64Bits
CPU: Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz - 15 Cores
Memory: 240GB
I/O: 17TB RAID0 (17 Disks (1TB for disk) SSD with read and write cache enabled)

Server StandBy
SO: CentOS Linux release 7.4.1708 (Core) 64Bits
CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz - 20 Cores
Memory: 140GB
I/O: 22TB RAID0 (22 Disks (1TB for disk) SSD with read and write cache enabled)

Configuration PostgreSQL Production

postgres.conf

listen_addresses = '*' 
port = 5433 
max_connections = 2000
superuser_reserved_connections = 3 
shared_buffers = 56GB 
work_mem = 29360kB
maintenance_work_mem = 4GB 
autovacuum_work_mem = 8GB 
max_stack_depth = 5MB 
dynamic_shared_memory_type = posix 
bgwriter_delay = 10ms 
bgwriter_lru_maxpages = 700 
bgwriter_lru_multiplier = 2.0 
fsync = off 
synchronous_commit = on 
full_page_writes = off 
wal_buffers = 1500MB 
wal_writer_delay = 1ms 
checkpoint_timeout = 10min 
max_wal_size = 4GB
min_wal_size = 2GB
checkpoint_completion_target = 0.9 
effective_cache_size = 168GB
default_statistics_target = 500
log_destination = 'csvlog' 
logging_collector = on 
log_directory = 'pg_log' 
log_filename = 'postgresql-%w.log' 
log_file_mode = 0640 
log_truncate_on_rotation = on 
log_rotation_age = 1d 
log_rotation_size = 600MB 
log_min_duration_statement = 0 
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h' 
log_lock_waits = on 
log_timezone = 'Brazil/East'
autovacuum = on 
log_autovacuum_min_duration = -1 
autovacuum_max_workers = 12 
autovacuum_naptime = 30s 
autovacuum_vacuum_threshold = 20 
autovacuum_analyze_threshold = 20 
autovacuum_vacuum_scale_factor = 0.2 
autovacuum_analyze_scale_factor = 0.1 
autovacuum_vacuum_cost_delay = 20
autovacuum_vacuum_cost_limit = 800
vacuum_cost_delay = 20 
vacuum_cost_limit = 800
datestyle = 'iso, mdy'
timezone = 'Brazil/East'
lc_messages = 'en_US.UTF-8' 
lc_monetary = 'en_US.UTF-8' 
lc_numeric = 'en_US.UTF-8' 
lc_time = 'en_US.UTF-8' 
default_text_search_config = 'pg_catalog.english'
max_locks_per_transaction = 256
pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
effective_io_concurrency = 10
wal_level = hot_standby
max_wal_senders = 10
max_replication_slots = 3
wal_keep_segments = 38400

Configuration PostgreSQL StandBy

postgres.conf

listen_addresses = '*' 
port = 5433 
max_connections = 2000
superuser_reserved_connections = 3 
shared_buffers = 10GB 
work_mem = 183500kB
maintenance_work_mem = 4GB 
autovacuum_work_mem = 8GB 
max_stack_depth = 5MB 
dynamic_shared_memory_type = posix 
bgwriter_delay = 10ms 
bgwriter_lru_maxpages = 700 
bgwriter_lru_multiplier = 2.0 
fsync = off 
synchronous_commit = on 
full_page_writes = off 
wal_buffers = 16MB 
wal_writer_delay = 1ms 
checkpoint_timeout = 10min 
max_wal_size = 4GB
min_wal_size = 2GB
checkpoint_completion_target = 0.9 
effective_cache_size = 104GB
default_statistics_target = 500
log_destination = 'csvlog' 
logging_collector = on 
log_directory = 'pg_log' 
log_filename = 'postgresql-%w.log' 
log_file_mode = 0640 
log_truncate_on_rotation = on 
log_rotation_age = 1d 
log_rotation_size = 600MB 
log_min_duration_statement = 0 
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h' 
log_lock_waits = on 
log_timezone = 'Brazil/East'
autovacuum = on
log_autovacuum_min_duration = -1 
autovacuum_max_workers = 3
autovacuum_naptime = 30s 
autovacuum_vacuum_threshold = 20 
autovacuum_analyze_threshold = 20 
autovacuum_vacuum_scale_factor = 0.2 
autovacuum_analyze_scale_factor = 0.1 
autovacuum_vacuum_cost_delay = 20
autovacuum_vacuum_cost_limit = 800
vacuum_cost_delay = 20 
vacuum_cost_limit = 800
datestyle = 'iso, mdy'
timezone = 'Brazil/East'
lc_messages = 'en_US.UTF-8' 
lc_monetary = 'en_US.UTF-8' 
lc_numeric = 'en_US.UTF-8' 
lc_time = 'en_US.UTF-8' 
default_text_search_config = 'pg_catalog.english'
max_locks_per_transaction = 256
pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
effective_io_concurrency = 10
max_worker_processes = 18
hot_standby = on
hot_standby_feedback = on

recovery.conf 

standby_mode       = 'on'
primary_slot_name  = 'replicacao'
primary_conninfo   = 'host=bdreplica00 port=5433 user=replicator password='
trigger_file       = '/var/lib/pgsql/9.6/data/master_db.conf'

-- 

Marcelo Krüger
Data Integration
Office: +55 48 3333-2030 | Mobile: +55 48 999679585
R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar |  88034-132 | Santa Catarina, Brasil  
São Paulo | Florianópolis | New York

Conteúdo confidencial. Caso você não seja o real destinatário deste e-mail por favor notifique o remetente e elimine esta mensagem. Privileged and confidential. If you are not the intended addressee of this e-mail please notify the sender and promptly delete this message.

Hi Marcelo Kruger,

From server configuration details provided, looks like a heavy trans DB & more logging enabled which might be resulting more write act on Master and Slave.
At our first glance, we see log_min_duration_statement is set to 0 which might be writing all the queries into the log resulting heavy writing activity that maybe cause for replication lag.

Hope this helps you.

-- 

-- 

Marcelo Krüger
Data Integration
Office: +55 48 3333-2030 | Mobile: +55 48 999679585
R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar |  88034-132 | Santa Catarina, Brasil  
São Paulo | Florianópolis | New York

Conteúdo confidencial. Caso você não seja o real destinatário deste e-mail por favor notifique o remetente e elimine esta mensagem. Privileged and confidential. If you are not the intended addressee of this e-mail please notify the sender and promptly delete this message.

-- 

-- 

Marcelo Krüger
Data Integration
Office: +55 48 3333-2030 | Mobile: +55 48 999679585
R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar |  88034-132 | Santa Catarina, Brasil  
São Paulo | Florianópolis | New York

Conteúdo confidencial. Caso você não seja o real destinatário deste e-mail por favor notifique o remetente e elimine esta mensagem. Privileged and confidential. If you are not the intended addressee of this e-mail please notify the sender and promptly delete this message.

Further to analysis, we fear it may not be a just parameter change.

we might have to make few strategies around db-archiving so that multiple table population in short time should happen during the off peak hours to avoid excessive generation of archives during peak hours.

However High volume database  like this needs careful planning and deep investigation in order to reduce  application archive generation

--