Re: Failed to send extended operation: LDAP error -1 (Can't contact LDAP server)

Graham Leggett <minfrin@xxxxxxxx> · Mon, 5 May 2014 16:55:39 +0200

On 05 May 2014, at 11:37 AM, Graham Leggett <minfrin@xxxxxxxx> wrote:

>> It should be possible to add an N+1th replica to an N-node deployment. Replication agreements are peer-to-peer, so you just add a new replication agreement from each of the servers you want to feed changes to the N+1th (typically all of them).
> 
> What I've learned so far:
> 
> - servera has "syntax checking" switched off, and contains data with syntax errors. The data is 15 years old.
> - serverb has "syntax checking" switched on, but has successfully been able to replicate in the past. Now replication is broken with serverb.
> - serverc has "syntax checking" switched on, and has never been able to replicate. Serverc is brand new.
> 
> What appears to be happening is that during the replication process, an LDAP operation that is accepted on servera is being rejected by serverc. The replication process is brittle, and has not been coded to handle any kind of error during the replication process, and so fails abruptly with "ERROR bulk import abandoned" and no further explanation. The error that triggered the abort is only visible by turning trace logging on.

With a higher level of trace logging I have learned some more.

One of the objects being replicated is a large group containing about 21000 uniqueMembers. When it comes to replicate this object, the replication pauses for about 6 seconds or so, and at that point it times out, responding with the following misleading error message:

[05/May/2014:15:33:36 +0100] NSMMReplicationPlugin - agmt="cn=Agreement serverc.example.com" (serverc:636): Failed to send extended operation: LDAP error -1 (Can't contact LDAP server)

serverc is in Johannesburg, on a far slower connection than servera in DFW and serverb in London. It appears there is some kind of timeout that kicks in and causes the replication to suddenly be abandoned without warning.

Does anyone know what timeout is used during replication and how you set this timeout?

Regards,
Graham
--

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users