Fabio Isgr? wrote: > Hi All, > > I'm writing this mail to report a strange behavior in 389-ds when > configured as MultiMasterReplication. > > The scenario is very simple an application use it as authentication > base, the access to ldap server is regulated by a software load balancer > with two nodes. > > The strange thing I noticed is every 48 hour the server must be killed > and restarted beacause doesn't respond to any request but the port and > the process are still alive, some times MMR stops due to an not existing > conflict in some entry > Could this be https://bugzilla.redhat.com/show_bug.cgi?id=547503 ? > But today comes the disastrous thing over, MMR doesn't work anymore. > Checking the error log on first node was reported a strange thing > >> [17/Feb/2010:13:11:45 +0100] NSMMReplicationPlugin - Replication agreement for agmt="cn=srvprd-l011v->srvprd-l012v" (172:389) could not be updated. For replication to take place, please enable the suffix and restart the server >> [17/Feb/2010:13:13:25 +0100] NSMMReplicationPlugin - Total update aborted: Replication agreement for "agmt="cn=srvprd-l011v->srvprd-l012v" (172:389)" can not be updated while the replica is disabled >> [17/Feb/2010:13:13:25 +0100] NSMMReplicationPlugin - (If the suffix is disabled you must enable it then restart the server for replication to take place). >> [17/Feb/2010:13:14:50 +0100] NSMMReplicationPlugin - conn=30964 op=3 repl="ou=ext,o=MYroot": Begin incremental protocol >> [17/Feb/2010:13:14:50 +0100] NSMMReplicationPlugin - conn=30964 op=3 replica="unknown": Unable to acquire replica: error: no such replica >> [17/Feb/2010:13:14:50 +0100] NSMMReplicationPlugin - conn=30964 op=3 repl="ou=ext,o=MYroot": StartNSDS50ReplicationRequest: response=6 rc=0 >> [17/Feb/2010:13:16:27 +0100] NSMMReplicationPlugin - Total update aborted: Replication agreement for "agmt="cn=srvprd-l011v->srvprd-l012v" (172:389)" can not be updated while the replica is disabled >> [17/Feb/2010:13:16:27 +0100] NSMMReplicationPlugin - (If the suffix is disabled you must enable it then restart the server for replication to take place). >> > > But either the replica and the suffix are already enabled !!!! The first > thing I do is to restart the istance but print this > > Starting dirsrv: > srvprd-l011v... 389-Directory/1.2.5 B2010.012.2033 > srvprd-l011v.MYroot.com:389 (/etc/dirsrv/slapd-srvprd-l011v) > > [17/Feb/2010:13:32:47 +0100] - 389-Directory/1.2.5 B2010.012.2033 starting up > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - _replica_init_from_config: failed to create csn generator for replica (cn=replica,cn=\22ou=ext,o=MYroot\22,cn=mapping tree, cn=config) > [17/Feb/2010:13:32:47 +0100] - replica_destroy > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - Unable to configure replica ou=esterni,o=siae: failed to create csn generator for replica (cn=replica,cn=\22ou=ext,o=MYroot\22,cn=mapping tree, cn=config) > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - changelog program - _cl5CheckGuardian: found old style of guardian file: bdb/4.3/libreplication-plugin > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - changelog program - _cl5DBOpen: file d0bbdd82-1dd111b2-a05ca5d6-b0600000_4b699709000000010000.db4 has no matching replica; removing > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - changelog program - _cl5DBOpen: failed to remove (???fU?) file; libdb error - 2 (No such file or directory) > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - changelog program - _cl5DBOpen: opened 0 existing databases in /var/lib/dirsrv/slapd-srvprd-l011v/changelogdb > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - Found replication agreement named "cn=srvprd-l011v->srvprd-l012v, cn=replica, cn="ou=ext,o=MYroot", cn=mapping tree, cn=config". > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - The replication agreement named "cn=srvprd-l011v->srvprd-l012v, cn=replica, cn="ou=ext,o=MYroot", cn=mapping tree, cn=config" could not be correctly parsed. No replication will occur with this replica. > [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - agmtlist_config_init: found 0 replication agreements in DIT > [17/Feb/2010:13:32:47 +0100] - slapd started. Listening on All Interfaces port 389 for LDAP requests > > This seems like the replication agreement entry and/or the CSN generator entry was somehow corrupted. Can you post relevant excerpts from your dse.ldif file? > Due to business continuity I did restore the MMR as soon as possible and > I must did it removing replicas and changelog to recreate it from the > ground up. > > What Can I do to being MMR more reliable??? > > Best Regards > Fabio Isgr? > > > -- > 389 users mailing list > 389-users at lists.fedoraproject.org > https://admin.fedoraproject.org/mailman/listinfo/389-users