Here is another issue with replication :
i have two servers with multi-master agreements on each of
them (the same configuration as in ticket
https://fedorahosted.org/389/ticket/47942).
We add/delete a lot of groups (943, to be exact). Each
group may contain a large number of referenced entries, up
to ~250 (uniqueMember: dn). MemberOf plugin is activated
and works fine. Referential integrity plugin is also
activated but of course it is of any sense only when
deleting groups (or renaming them). It goes on for a long
time (20-30 minutes or more). Some time after the
beginning of the operations (typically 5-8 minutes) we
have replication erros and inconsistency of the replica
concerning the entries mentioned in error log.
When adding and deleting groups the supplier is ok.
Howevere the consumer has several (from one to four or
five) groupe deletions/adds that are not replicated. The
error on the supplier:
[12/Nov/2014:16:46:42 +0100] NSMMReplicationPlugin -
agmt="cn=Replication from ldap-edev.polytechnique.fr to
ldap-model.polytechnique.fr" (ldap-model:636): Consumer
failed to replay change (uniqueid
fa90219d-6a8211e4-a42c901a-94623bee, CSN
546380d6000000020000): Operations error (1). Will retry
later.
[12/Nov/2014:16:47:55 +0100] NSMMReplicationPlugin -
agmt="cn=Replication from ldap-edev.polytechnique.fr to
ldap-model.polytechnique.fr" (ldap-model:636): Consumer
failed to replay change (uniqueid
1e5367ae-6a8311e4-a42c901a-94623bee, CSN
54638125000000020000): Operations error (1). Will retry
later.
[12/Nov/2014:16:53:14 +0100] NSMMReplicationPlugin -
agmt="cn=Replication from ldap-edev.polytechnique.fr to
ldap-model.polytechnique.fr" (ldap-model:636): Consumer
failed to replay change (uniqueid
f4e70b85-6a8311e4-a42c901a-94623bee, CSN
54638260000000020000): Operations error (1). Will retry
later.
[12/Nov/2014:16:55:12 +0100] NSMMReplicationPlugin -
agmt="cn=Replication from ldap-edev.polytechnique.fr to
ldap-model.polytechnique.fr" (ldap-model:636): Consumer
failed to replay change (uniqueid
3c6d978a-6a8411e4-a42c901a-94623bee, CSN
546382d6000400020000): Operations error (1). Will retry
later.
[12/Nov/2014:16:56:31 +0100] NSMMReplicationPlugin -
agmt="cn=Replication from ldap-edev.polytechnique.fr to
ldap-model.polytechnique.fr" (ldap-model:636): Consumer
failed to replay change (uniqueid
6030dd93-6a8411e4-a42c901a-94623bee, CSN
54638325000000020000): Operations error (1). Will retry
later.
[12/Nov/2014:16:57:22 +0100] NSMMReplicationPlugin -
agmt="cn=Replication from ldap-edev.polytechnique.fr to
ldap-model.polytechnique.fr" (ldap-model:636): Consumer
failed to replay change (uniqueid
83f42395-6a8411e4-a42c901a-94623bee, CSN
5463835d000000020000): Operations error (1). Will retry
later.
The corresponding errors on the consumer seem to hint
deadlocks in these cases:
[12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: retry (49) the
transaction (csn=546380d6000000020000) failed (rc=-30993
(BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock))
[12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: failed to write
entry with csn (546380d6000000020000); db error - -30993
BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock
[12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin -
write_changelog_and_ruv: can't add a change for
cn=LAN452ESP-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
(uniqid: fa90219d-6a8211e4-a42c901a-94623bee, optype: 16)
to changelog csn 546380d6000000020000
[12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: retry (49) the
transaction (csn=54638125000000020000) failed (rc=-30993
(BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock))
[12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: failed to write
entry with csn (54638125000000020000); db error - -30993
BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock
[12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin -
write_changelog_and_ruv: can't add a change for
cn=LAN472EFLE-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
(uniqid: 1e5367ae-6a8311e4-a42c901a-94623bee, optype: 16)
to changelog csn 54638125000000020000
[12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: retry (49) the
transaction (csn=54638260000000020000) failed (rc=-30993
(BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock))
[12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: failed to write
entry with csn (54638260000000020000); db error - -30993
BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock
[12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin -
write_changelog_and_ruv: can't add a change for
cn=MAT471-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
(uniqid: f4e70b85-6a8311e4-a42c901a-94623bee, optype: 16)
to changelog csn 54638260000000020000
[12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: retry (49) the
transaction (csn=546382d6000400020000) failed (rc=-30993
(BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock))
[12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: failed to write
entry with csn (546382d6000400020000); db error - -30993
BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock
[12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin -
write_changelog_and_ruv: can't add a change for
cn=MEC592-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
(uniqid: 3c6d978a-6a8411e4-a42c901a-94623bee, optype: 16)
to changelog csn 546382d6000400020000
[12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: retry (49) the
transaction (csn=54638325000000020000) failed (rc=-30993
(BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock))
[12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: failed to write
entry with csn (54638325000000020000); db error - -30993
BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock
[12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin -
write_changelog_and_ruv: can't add a change for
cn=PHY566-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
(uniqid: 6030dd93-6a8411e4-a42c901a-94623bee, optype: 16)
to changelog csn 54638325000000020000
[12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: retry (49) the
transaction (csn=5463835d000000020000) failed (rc=-30993
(BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock))
[12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin -
changelog program - _cl5WriteOperationTxn: failed to write
entry with csn (5463835d000000020000); db error - -30993
BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock
[12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin -
write_changelog_and_ruv: can't add a change for
cn=PHY651K-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
(uniqid: 83f42395-6a8411e4-a42c901a-94623bee, optype: 16)
to changelog csn 5463835d000000020000