389-ds-base-1.2.10.14-3.2.el5: replication latency >= 300s, 400s, 900s..

Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> · Mon, 18 Nov 2013 06:12:47 -0500

Hi,

Problem: During high levels of writes to the master server, _some_
LDAP search hosts replication_latency rises (and in some cases, does
not recover) unless its removed from service or slapd is restarted.

Version: 389-ds-base-1.2.10.14-3.2.el5 (x86_64)

During normal operation, typical replication latency is <= 60 seconds
in our environment.

Questions:
1. Does slapd's BDB backend DB's suffer fragmentation over time?  Will
periodic restarts of the slapd improve performance?

2. There are search hosts running on a variety of hardware and it
happens more or less across the board.  We've tried specific tweaks on
the hosts with the IO scheduler and other system tuning but this
problem appears(?) to be at the application-level.  Is there any
system tuning that I am missing here or slapd tuning to avoid this
problem?

3. Further, is there or are there plans to support network-level
compression on the replication data from the master->hubs->search
hosts?

4. Aside from removing the host from active service and/or restarting
slapd and removing it from service, what are the recommended best
practices to avoid this problem happening in the first place?
Currently we are looking into automation to remove the hosts from
service automatically when latency breaches a certain threshold-- but
why does this occur and is there anyway to tweak/tune or limit search
performance until the replication latency recovers?

Example: Problem begins:
replication_latency_sec=424

[ .. after a few min .. ]

replication_latency_sec=549
replication_latency_sec=553
replication_latency_sec=555
replication_latency_sec=561
replication_latency_sec=564

[ .. after 10 min or so .. ]

replication_latency_sec=780
replication_latency_sec=763
replication_latency_sec=768
replication_latency_sec=774
replication_latency_sec=779
replication_latency_sec=785
replication_latency_sec=791
replication_latency_sec=796
replication_latency_sec=802
replication_latency_sec=807
replication_latency_sec=813
replication_latency_sec=818
replication_latency_sec=824

Justin.
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users