Steffen Blume wrote: > Rich Megginson wrote: > >> Steffen Blume wrote: >> >> >>> Hi, >>> >>> I have tried to setup up multi master replication without success. The >>> two ldap servers are running fine. Then I execute the mmr.pl script (on b): >>> ./mmr.pl --host1 a.domain.local --host2 b.domain.local --bindpw secret >>> --host1_id 1 --host2_id 2 --repmanpw secret --base "dc=domain, dc=local" >>> --create >>> >>> --- error log on a --- >>> [01/Sep/2010:14:11:39 +0200] NSMMReplicationPlugin - >>> agmt="cn="Replication to b.domain.local"" (b:389): Replica has a >>> different generation ID than the local data. >>> [01/Sep/2010:14:11:42 +0200] NSMMReplicationPlugin - Beginning total >>> update of replica "agmt="cn="Replication to b.domain.local"" (b:389)". >>> [01/Sep/2010:14:11:47 +0200] NSMMReplicationPlugin - Finished total >>> update of replica "agmt="cn="Replication to b.domain.local"" (b:389)". >>> Sent 1375 entries. >>> -------------------- >>> >>> --- error log on b --- >>> [01/Sep/2010:14:11:39 +0200] NSMMReplicationPlugin - >>> agmt="cn="Replication to a.domain.local"" (a:389): Replica has a >>> different generation ID than the local data. >>> [01/Sep/2010:14:11:40 +0200] NSMMReplicationPlugin - >>> repl_set_mtn_referrals: could not set referrals for replica >>> dc=domain,dc=local: 32 >>> [01/Sep/2010:14:11:40 +0200] NSMMReplicationPlugin - >>> multimaster_be_state_change: replica dc=domain,dc=local is going >>> offline; disabling replication >>> [01/Sep/2010:14:11:41 +0200] - somehow, there are still 200 entries in >>> the entry cache. :/ >>> [01/Sep/2010:14:11:42 +0200] - WARNING: Import is running with >>> nsslapd-db-private-import-mem on; No other process is allowed to access >>> the database >>> [01/Sep/2010:14:11:46 +0200] - import userRoot: Workers finished; >>> cleaning up... >>> [01/Sep/2010:14:11:46 +0200] - import userRoot: Workers cleaned up. >>> [01/Sep/2010:14:11:46 +0200] - import userRoot: Indexing complete. >>> Post-processing... >>> [01/Sep/2010:14:11:46 +0200] - import userRoot: Flushing caches... >>> [01/Sep/2010:14:11:46 +0200] - import userRoot: Closing files... >>> [01/Sep/2010:14:11:46 +0200] - somehow, there are still 200 entries in >>> the entry cache. :/ >>> [01/Sep/2010:14:11:47 +0200] - import userRoot: Import complete. >>> Processed 1375 entries in 5 seconds. (275.00 entries/sec) >>> [01/Sep/2010:14:11:47 +0200] NSMMReplicationPlugin - >>> multimaster_be_state_change: replica dc=domain,dc=local is coming >>> online; enabling replication >>> [01/Sep/2010:14:11:47 +0200] NSMMReplicationPlugin - >>> _replica_configure_ruv: failed to create replica ruv tombstone entry >>> (dc=domain, dc=local); LDAP error - 68 >>> >>> >>> >> This means the RUV entry or some other MMR state information was left >> over from a previous configuration attempt. Err=68 is Already Exists - >> the entry already exists. >> >> Since this fails, nothing else is going to work. >> >> >>> [01/Sep/2010:14:11:47 +0200] NSMMReplicationPlugin - >>> replica_enable_replication: reloading ruv failed >>> [01/Sep/2010:14:11:49 +0200] NSMMReplicationPlugin - >>> _replica_configure_ruv: failed to create replica ruv tombstone entry >>> (dc=domain, dc=local); LDAP error - 68 >>> [01/Sep/2010:14:12:19 +0200] NSMMReplicationPlugin - >>> _replica_configure_ruv: failed to create replica ruv tombstone entry >>> (dc=domain, dc=local); LDAP error - 68 >>> [01/Sep/2010:14:12:49 +0200] NSMMReplicationPlugin - >>> _replica_configure_ruv: failed to create replica ruv tombstone entry >>> (dc=domain, dc=local); LDAP error - 68 >>> [01/Sep/2010:14:13:19 +0200] NSMMReplicationPlugin - >>> _replica_configure_ruv: failed to create replica ruv tombstone entry >>> (dc=domain, dc=local); LDAP error - 68 >>> [01/Sep/2010:14:13:49 +0200] NSMMReplicationPlugin - >>> _replica_configure_ruv: failed to create replica ruv tombstone entry >>> (dc=domain, dc=local); LDAP error - 68 >>> -------------------- >>> >>> So what do the errors "repl_set_mtn_referrals: could not set referrals" >>> and "_replica_configure_ruv: failed to create replica ruv tombstone >>> entry" mean? >>> >>> The messages on b stop, when I restart the ldap server. But the >>> replication is not working. >>> >>> >> Since MMR setup failed, no MMR is going to work. >> >> >>> On the first replication setup not all the >>> data was copied. I removed the replication configuration with mmr.pl >>> >>> >> I think this is the problem. Either mmr.pl does not cleanly remove the >> replication configuration, or there is a bug in the server. For >> example, see https://bugzilla.redhat.com/show_bug.cgi?id=624442 >> >> >>> and >>> set it up again with same error messages. >>> When I change something (in uid=sbl,ou=people,...) on a the error log of >>> a shows >>> --- error log on a --- >>> [01/Sep/2010:14:35:20 +0200] NSMMReplicationPlugin - >>> agmt="cn="Replication to b.domain.local"" (b:389): Replica has a >>> different generation ID than the local data. >>> [01/Sep/2010:14:35:24 +0200] NSMMReplicationPlugin - >>> agmt="cn="Replication to b.domain.local"" (b:389): Replica has a >>> different generation ID than the local data. >>> [01/Sep/2010:14:35:28 +0200] NSMMReplicationPlugin - >>> agmt="cn="Replication to b.domain.local"" (b:389): Replica has a >>> different generation ID than the local data. >>> ... >>> -------------------- >>> >>> >>> >> This means the consumer was not initialized properly. >> >> >>> Nothing in error log on b. But in access log: >>> >>> --- acces log on b --- >>> [01/Sep/2010:14:35:20 +0200] conn=0 op=3 SRCH base="ou=People, >>> dc=domain, dc=local" scope=1 filter="(objectClass=*)" attrs="objectClass" >>> [01/Sep/2010:14:35:20 +0200] conn=0 op=7 EXT >>> oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" >>> [01/Sep/2010:14:35:20 +0200] conn=0 op=7 RESULT err=0 tag=120 nentries=0 >>> etime=0 >>> [01/Sep/2010:14:35:20 +0200] conn=0 op=8 EXT >>> oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" >>> [01/Sep/2010:14:35:20 +0200] conn=0 op=8 RESULT err=0 tag=120 nentries=0 >>> etime=0 >>> [01/Sep/2010:14:35:20 +0200] conn=0 op=3 RESULT err=0 tag=101 >>> nentries=100 etime=0 notes=U >>> [01/Sep/2010:14:35:20 +0200] conn=0 op=4 SRCH base="ou=People, >>> dc=domain, dc=local" scope=1 filter="(objectClass=*)" attrs="objectClass" >>> [01/Sep/2010:14:35:20 +0200] conn=0 op=4 RESULT err=0 tag=101 >>> nentries=82 etime=0 >>> [01/Sep/2010:14:35:24 +0200] conn=0 op=10 EXT >>> oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" >>> [01/Sep/2010:14:35:24 +0200] conn=0 op=10 RESULT err=0 tag=120 >>> nentries=0 etime=0 >>> [01/Sep/2010:14:35:24 +0200] conn=0 op=11 EXT >>> oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" >>> [01/Sep/2010:14:35:24 +0200] conn=0 op=11 RESULT err=0 tag=120 >>> nentries=0 etime=0 >>> [01/Sep/2010:14:35:25 +0200] conn=0 op=5 SRCH >>> base="uid=sbl,ou=People,dc=domain,dc=local" scope=0 >>> filter="(objectClass=*)" attrs=ALL >>> [01/Sep/2010:14:35:25 +0200] conn=0 op=5 RESULT err=0 tag=101 nentries=1 >>> etime=0 >>> [01/Sep/2010:14:35:27 +0200] conn=0 op=12 EXT >>> oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" >>> [01/Sep/2010:14:35:27 +0200] conn=0 op=12 RESULT err=0 tag=120 >>> nentries=0 etime=0 >>> [01/Sep/2010:14:35:27 +0200] conn=0 op=13 EXT >>> oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" >>> [01/Sep/2010:14:35:27 +0200] conn=0 op=13 RESULT err=0 tag=120 >>> nentries=0 etime=0 >>> ... >>> -------------------- >>> >>> Both 389 DS versions are 1.2.4. I compiled it myself for OpenSolaris >>> (SunOS 5.11 snv_111b) >>> >>> >>> >> Try 1.2.6. There have been many, many bug fixes between 1.2.4 and 1.2.6. >> >> > I compiled 1.2.6 today and installed it on A and B. I converted my > database on A to the new format. Then I set up a new clean instance on B > and tried to set up replication. The error log on B gave me some errors > about entries without parents. So I decided to export my whole data > (userRoot) on A into an ldif, set up a new clean instance on A and B, > import the data on A and set up replication using the mmr script again. > And again I got the error message: "failed to create replica ruv > tombstone entry (dc=domain,dc=local); LDAP error - 68" And the initial > copy is not complete! > Would it be possible for you to reproduce on Fedora? > Replication info is stored below netscapeRoot that I did not ex- or > import, so this cannot be the problem this time, right? > > I don't think, the error logs are helpful but here they are for > completeness: > > --- errors on A --- > 389-Directory/1.2.6 B2010.253.730 > a.domain.local:636 (/usr/ldap/etc/dirsrv/slapd-a) > > [10/Sep/2010:17:59:34 +0200] - 389-Directory/1.2.6 B2010.253.730 starting up > [10/Sep/2010:17:59:35 +0200] - slapd started. Listening on All > Interfaces port 389 for LDAP requests > [10/Sep/2010:17:59:35 +0200] - Listening on All Interfaces port 636 for > LDAPS requests > [10/Sep/2010:18:01:28 +0200] NSMMReplicationPlugin - > agmt="cn=Replication to b.domain.local" (b:389): Replica has a different > generation ID than the local data. > [10/Sep/2010:18:01:32 +0200] NSMMReplicationPlugin - Beginning total > update of replica "agmt="cn=Replication to b.domain.local" (b:389)". > [10/Sep/2010:18:01:37 +0200] NSMMReplicationPlugin - Finished total > update of replica "agmt="cn=Replication to b.domain.local" (b:389)". > Sent 1375 entries. > -------------------------- > > --- errors on B --- > 389-Directory/1.2.6 B2010.253.730 > b.domain.local:636 (/usr/ldap/etc/dirsrv/slapd-b) > > [10/Sep/2010:17:58:28 +0200] - 389-Directory/1.2.6 B2010.253.730 starting up > [10/Sep/2010:17:58:29 +0200] - slapd started. Listening on All > Interfaces port 389 for LDAP requests > [10/Sep/2010:17:58:29 +0200] - Listening on All Interfaces port 636 for > LDAPS requests > [10/Sep/2010:18:01:29 +0200] NSMMReplicationPlugin - > agmt="cn=Replication to a.domain.local" (a:389): Replica has a different > generation ID than the local data. > [10/Sep/2010:18:01:30 +0200] NSMMReplicationPlugin - > multimaster_be_state_change: replica dc=domain,dc=local is going > offline; disabling replication > [10/Sep/2010:18:01:31 +0200] - entrycache_clear_int: there are still 6 > entries in the entry cache. :/ > [10/Sep/2010:18:01:31 +0200] - dncache_clear_int: there are still 6 dn's > in the dn cache. :/ > [10/Sep/2010:18:01:31 +0200] - WARNING: Import is running with > nsslapd-db-private-import-mem on; No other process is allowed to access > the database > [10/Sep/2010:18:01:35 +0200] - import userRoot: Workers finished; > cleaning up... > [10/Sep/2010:18:01:36 +0200] - import userRoot: Workers cleaned up. > [10/Sep/2010:18:01:36 +0200] - import userRoot: Indexing complete. > Post-processing... > [10/Sep/2010:18:01:36 +0200] - import userRoot: Flushing caches... > [10/Sep/2010:18:01:36 +0200] - import userRoot: Closing files... > [10/Sep/2010:18:01:36 +0200] - entrycache_clear_int: there are still 17 > entries in the entry cache. :/ > [10/Sep/2010:18:01:36 +0200] - dncache_clear_int: there are still 1375 > dn's in the dn cache. :/ > [10/Sep/2010:18:01:36 +0200] - import userRoot: Import complete. > Processed 1375 entries in 5 seconds. (275.00 entries/sec) > [10/Sep/2010:18:01:36 +0200] NSMMReplicationPlugin - > multimaster_be_state_change: replica dc=domain,dc=local is coming > online; enabling replication > [10/Sep/2010:18:01:36 +0200] NSMMReplicationPlugin - > _replica_configure_ruv: failed to create replica ruv tombstone entry > (dc=domain,dc=local); LDAP error - 68 > [10/Sep/2010:18:01:36 +0200] NSMMReplicationPlugin - > replica_enable_replication: reloading ruv failed > [10/Sep/2010:18:01:38 +0200] NSMMReplicationPlugin - > _replica_configure_ruv: failed to create replica ruv tombstone entry > (dc=domain,dc=local); LDAP error - 68 > [10/Sep/2010:18:02:08 +0200] NSMMReplicationPlugin - > _replica_configure_ruv: failed to create replica ruv tombstone entry > (dc=domain,dc=local); LDAP error - 68 > -------------------------- > >