Re: [389-users] Multi-Master Replication doesn't work after a while

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fabio Isgrò wrote:
> Hi All,
>
> I'm writing this mail to report a strange behavior in 389-ds when
> configured as MultiMasterReplication.
>
> The scenario is very simple an application use it as authentication
> base, the access to ldap server is regulated by a software load balancer
> with two nodes.
>
> The strange thing I noticed is every 48 hour the server must be killed
> and restarted beacause doesn't respond to any request but the port and
> the process are still alive, some times MMR stops due to an not existing
> conflict in some entry
>   
Could this be https://bugzilla.redhat.com/show_bug.cgi?id=547503
?
> But today comes the disastrous thing over, MMR doesn't work anymore.
> Checking the error log on first node was reported a strange thing
>   
>> [17/Feb/2010:13:11:45 +0100] NSMMReplicationPlugin - Replication agreement for agmt="cn=srvprd-l011v->srvprd-l012v" (172:389) could not be updated. For replication to take place, please enable the suffix and restart the server
>> [17/Feb/2010:13:13:25 +0100] NSMMReplicationPlugin - Total update aborted: Replication agreement for "agmt="cn=srvprd-l011v->srvprd-l012v" (172:389)" can not be updated while the replica is disabled
>> [17/Feb/2010:13:13:25 +0100] NSMMReplicationPlugin - (If the suffix is disabled you must enable it then restart the server for replication to take place).
>> [17/Feb/2010:13:14:50 +0100] NSMMReplicationPlugin - conn=30964 op=3 repl="ou=ext,o=MYroot": Begin incremental protocol
>> [17/Feb/2010:13:14:50 +0100] NSMMReplicationPlugin - conn=30964 op=3 replica="unknown": Unable to acquire replica: error: no such replica
>> [17/Feb/2010:13:14:50 +0100] NSMMReplicationPlugin - conn=30964 op=3 repl="ou=ext,o=MYroot": StartNSDS50ReplicationRequest: response=6 rc=0
>> [17/Feb/2010:13:16:27 +0100] NSMMReplicationPlugin - Total update aborted: Replication agreement for "agmt="cn=srvprd-l011v->srvprd-l012v" (172:389)" can not be updated while the replica is disabled
>> [17/Feb/2010:13:16:27 +0100] NSMMReplicationPlugin - (If the suffix is disabled you must enable it then restart the server for replication to take place).
>>     

>
> But either the replica and the suffix are already enabled !!!! The first
> thing I do is to restart the istance but print this
>
> Starting dirsrv:
> srvprd-l011v... 389-Directory/1.2.5 B2010.012.2033
> srvprd-l011v.MYroot.com:389 (/etc/dirsrv/slapd-srvprd-l011v)
>
> [17/Feb/2010:13:32:47 +0100] - 389-Directory/1.2.5 B2010.012.2033 starting up
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - _replica_init_from_config: failed to create csn generator for replica (cn=replica,cn=\22ou=ext,o=MYroot\22,cn=mapping tree, cn=config)
> [17/Feb/2010:13:32:47 +0100] - replica_destroy
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - Unable to configure replica ou=esterni,o=siae: failed to create csn generator for replica (cn=replica,cn=\22ou=ext,o=MYroot\22,cn=mapping tree, cn=config)
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - changelog program - _cl5CheckGuardian: found old style of guardian file: bdb/4.3/libreplication-plugin
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - changelog program - _cl5DBOpen: file d0bbdd82-1dd111b2-a05ca5d6-b0600000_4b699709000000010000.db4 has no matching replica; removing
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - changelog program - _cl5DBOpen: failed to remove (�ҷ�fU�) file; libdb error - 2 (No such file or directory)
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - changelog program - _cl5DBOpen: opened 0 existing databases in /var/lib/dirsrv/slapd-srvprd-l011v/changelogdb
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - Found replication agreement named "cn=srvprd-l011v->srvprd-l012v, cn=replica, cn="ou=ext,o=MYroot", cn=mapping tree, cn=config".
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - The replication agreement named "cn=srvprd-l011v->srvprd-l012v, cn=replica, cn="ou=ext,o=MYroot", cn=mapping tree, cn=config" could not be correctly parsed. No replication will occur with this replica.
> [17/Feb/2010:13:32:47 +0100] NSMMReplicationPlugin - agmtlist_config_init: found 0 replication agreements in DIT
> [17/Feb/2010:13:32:47 +0100] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
>
>   
This seems like the replication agreement entry and/or the CSN generator 
entry was somehow corrupted.  Can you post relevant excerpts from your 
dse.ldif file?
> Due to business continuity I did restore the MMR as soon as possible and
> I must did it removing replicas and changelog to recreate it from the
> ground up.
>
> What Can I do to being MMR more reliable???
>
> Best Regards
> Fabio Isgrò
>
>
> --
> 389 users mailing list
> 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users


[Index of Archives]     [Fedora Directory Users]     [Fedora Directory Devel]     [Fedora Announce]     [Fedora Legacy Announce]     [Kernel]     [Fedora Legacy]     [Share Photos]     [Fedora Desktop]     [PAM]     [Red Hat Watch]     [Red Hat Development]     [Big List of Linux Books]     [Gimp]     [Yosemite News]

  Powered by Linux