On 12/12/2009 12:06 AM, Rich Megginson wrote: > Mitja Miheli? wrote: >> >> >> On 12/07/2009 05:18 PM, Rich Megginson wrote: >>> Mitja Mihelic wrote: >>>> Hi! >>>> >>>> We have two instances of the DS in a multimaster replication setup. >>>> We had to restore the database of one of the servers from backup. >>>> While the second master was down, the first was receiving updates. >>>> After we fired up the restored master it started receiving updates as >>>> soon as a change occurred on the first master (i.e. after 15 minutes) >>>> After the sync finished, we noticed they weren't identical. >>>> Clicking "Send updates now" from the replication agreement does not >>>> help. >>>> >>>> Is there a way to get them synced up again ? Other than reinitializing >>>> the second/restored master ? >>> How long was the server down? How old was the backup it was >>> restored from? >> The server was not down long, but the backup was about 10 hours old. >> This was a backup at filesystem level made by ufsdump. It was not a >> "regular" DS backup. >> When we restored the database file from the dump the server booted OK. >> >> Then we made little test: >> - made another ufsdump of the second master >> - shut down the server >> - let the primary master update for an hour >> - restored the second master's database from the dump >> - started the second master >> - let them do their replication magic >> - isolated both servers (i.e. no updates) >> - compared the LDIF dumps >> Again, they were not the same. >> >> We probably should have used the built in backup functionality, right ? > Yes, although I'm not sure what would be causing the problems you see. > > In general, when the database state changes, you have to reinitialize > replication. We tried the built-in backup: /usr/lib/dirsrv/serverReplica/db2bak /var/lib/dirsrv/serverReplica/bak/`date +%Y_%m_%d_%H_%M_%S` Executed the same test procedure as described above. There are still entries on the primary server that do not get replayed on the secondary. An error message (repeated every 5 minutes) from the primary master SERVER1 occurs when a record, that is missing on the secondary, gets updated on the primary: [16/Dec/2009:10:26:02 +0100] NSMMReplicationPlugin - agmt="cn=MM to SERVER2" (SERVER2:389): Consumer failed to replay change (uniqueid 25ab6e01-1dd211b2-bdbbda0a-92130000, CSN 4b28a7ac0000000b0000): No such object. Skipping. My reasoning would be: if the entry does not exist on the consumer, create it. But I guest that is not how the mechanism works. I'm still scratching my head about this one... Regards, Mitja