On
10/22/2014 09:54 AM, Shilen Patel
wrote:
What is output of rpm -q 389-ds-base?
I have 6 replicas (two of which
are read-only). I ran into an
issue where a DELETE operation
failed on a server with error code
51 (ldap busy).
[21/Oct/2014:23:44:44
-0400] conn=78160 op=39510
RESULT err=51 tag=107 nentries=0
etime=3 csn=5447282c000300050000
The application retried the
delete several times for a couple
of hours (while the server wasn’t
getting any other requests) and
the result was always the same
(err=51). Each time that
happened, the error log had the
following:
[21/Oct/2014:23:44:44
-0400] - Retry count exceeded in
delete
My first question is, what
would cause a problem like this?
I simply restarted that
directory and then the update
succeeded. However, when the
update went to the other 5
servers, they failed in the same
way and the same error was logged
in their log files. But the
update wasn’t retried. It was
just skipped and future updates
via replication succeeded on those
5 servers.
My second question is, what’s
the best way to monitor for these
types of replication errors? In
this
case, nsds5replicaLastUpdateStatus
did not indicate a problem. If I
had not been looking at the error
file on those 5 hosts, I’m
wondering how I would have known
that a delete failed to replicate
to them. If the answer is to just
have something monitoring the
error log files, are there
specific search strings to look
for to separate out updates that
have failed and won’t be retried
from other errors (e.g. temporary
connection issues)? Just curious
if there is a best practice here.
Thanks!
— Shilen
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxxhttps://admin.fedoraproject.org/mailman/listinfo/389-users