Re: [389-users] Multimaster replication out of sync

Rich Megginson <rmeggins@xxxxxxxxxx> · Thu, 17 Dec 2009 16:59:08 -0700

Mitja Mihelič wrote:

On 12/12/2009 12:06 AM, Rich Megginson wrote:
Mitja Mihelič wrote:

On 12/07/2009 05:18 PM, Rich Megginson wrote:
Mitja Mihelic wrote:
Hi!

We have two instances of the DS in a multimaster replication setup.
We had to restore the database of one of the servers from backup.
While the second master was down, the first was receiving updates.
After we fired up the restored master it started receiving updates as
soon as a change occurred on the first master (i.e. after 15 minutes)
After the sync finished, we noticed they weren't identical.
Clicking "Send updates now" from the replication agreement does 
not help.

Is there a way to get them synced up again ? Other than 
reinitializing
the second/restored master ?
How long was the server down?  How old was the backup it was 
restored from?
The server was not down long, but the backup was about 10 hours old.
This was a backup at filesystem level made by ufsdump. It was not a 
"regular" DS backup.
When we restored the database file from the dump the server booted OK.

Then we made little test:
- made another ufsdump of the second master
- shut down the server
- let the primary master update for an hour
- restored the second master's database from the dump
- started the second master
- let them do their replication magic
- isolated both servers (i.e. no updates)
- compared the LDIF dumps
Again, they were not the same.

We probably should have used the built in backup functionality, right ?
Yes, although I'm not sure what would be causing the problems you see.

In general, when the database state changes, you have to reinitialize 
replication.
We tried the built-in backup:
/usr/lib/dirsrv/serverReplica/db2bak 
/var/lib/dirsrv/serverReplica/bak/`date +%Y_%m_%d_%H_%M_%S`

Executed the same test procedure as described above.

There are still entries on the primary server that do not get replayed 
on the secondary.

An error message (repeated every 5 minutes) from the primary master 
SERVER1 occurs when a record, that is missing on the secondary, gets 
updated on the primary:
[16/Dec/2009:10:26:02 +0100] NSMMReplicationPlugin - agmt="cn=MM to 
SERVER2" (SERVER2:389): Consumer failed to replay change (uniqueid 
25ab6e01-1dd211b2-bdbbda0a-92130000, CSN 4b28a7ac0000000b0000): No 
such object. Skipping.

My reasoning would be: if the entry does not exist on the consumer, 
create it. But I guest that is not how the mechanism works.
I'm still scratching my head about this one...
In general, if you restore or otherwise change a database, that server 
will have to be reinitialized in order for replication to work.

Regards,
Mitja

--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users

--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users