Re: 389-ds-base-1.2.10.14-3.2.el5: replication latency >= 300s, 400s, 900s..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/18/2013 04:12 AM, Justin Piszcz wrote:
Hi,

Problem: During high levels of writes to the master server, _some_
LDAP search hosts replication_latency rises (and in some cases, does
not recover) unless its removed from service or slapd is restarted.

Version: 389-ds-base-1.2.10.14-3.2.el5 (x86_64)

There is a 1.2.11 release in epel5 testing - I would strongly encourage you to try this version - 1.2.10 is known to have problems and will be gone from epel5 soon.


During normal operation, typical replication latency is <= 60 seconds
in our environment.

How are you measuring this?


Questions:
1. Does slapd's BDB backend DB's suffer fragmentation over time?
Yes.
Will
periodic restarts of the slapd improve performance?
Maybe.  It really depends on what exactly the problem is.

2. There are search hosts running on a variety of hardware and it
happens more or less across the board.  We've tried specific tweaks on
the hosts with the IO scheduler and other system tuning but this
problem appears(?) to be at the application-level.  Is there any
system tuning that I am missing here or slapd tuning to avoid this
problem?
Again, it really depends.

3. Further, is there or are there plans to support network-level
compression on the replication data from the master->hubs->search
hosts?

The data on the wire is already uses a fairly efficient encoding (BER).


4. Aside from removing the host from active service and/or restarting
slapd and removing it from service, what are the recommended best
practices to avoid this problem happening in the first place?
It depends.  What exactly is the problem?
Currently we are looking into automation to remove the hosts from
service automatically when latency breaches a certain threshold-- but
why does this occur and is there anyway to tweak/tune or limit search
performance until the replication latency recovers?
How are you measuring latency?

Replication performance depends on several factors:
1) Each replication agreement uses one (or more) threads. Each thread consumes CPU core resources and RAM resources. 2) Replication allows only a single supplier at a time. This means, for example, if you have masters A, B, and C, if an update comes into A while B is updating C, the update from A will have to go through B to get to C.
3) Write performance - disks, logging, indexing, etc.
4) Overall server performance.

I general, for performance related issues, I would start with using logconv.pl to analyze your access logs, and use dbmon.sh to look at your cache usage.

If the logconv.pl that comes with 1.2.10 is not suitable, I would suggest using the latest from the upstream git repository (or if you decide to upgrade to 1.2.11, use that one).
https://git.fedorahosted.org/cgit/389/ds.git/plain/ldap/admin/src/logconv.pl

dbmon.sh is here - https://github.com/richm/scripts/wiki/dbmon.sh


Example: Problem begins:
replication_latency_sec=424

[ .. after a few min .. ]

replication_latency_sec=549
replication_latency_sec=553
replication_latency_sec=555
replication_latency_sec=561
replication_latency_sec=564

[ .. after 10 min or so .. ]

replication_latency_sec=780
replication_latency_sec=763
replication_latency_sec=768
replication_latency_sec=774
replication_latency_sec=779
replication_latency_sec=785
replication_latency_sec=791
replication_latency_sec=796
replication_latency_sec=802
replication_latency_sec=807
replication_latency_sec=813
replication_latency_sec=818
replication_latency_sec=824

Justin.
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users





[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux